The #1 bug in my ~14 years of Linux

摘自: beranger.org  被阅读次数: 84


yangyi 于 2008-09-23 21:54:30 提供


June 02, 2008 at 06:22:55 GMT

I needed a break before writing about this with calm. Let's just say you drag a few files on the desktop, you drop them 100 pixels away... and they're gone forever!


That was indeed the most outrageous system bug I had in my whole life. The worst bug in Linux ever. Worse than any bug I ever had in all the operating systems on Earth. Less understandable than anything that happened to me under Windows! (I won't count an MS-DOS situation where the 2nd FAT copy was empty, and Norton something "corrected" the first FAT copy by nullifying it too!)


It was a nice, placid, unspectacular Sunday. In other words, this happened yesterday.


I woke up my laptop running Debian testing, and I started to browse a little with Iceweasel 2.0.0.14, and I was preparing to write something in OpenOffice.org. But before that, as I had a dozen of files on the desktop, I thought I could rearrange them a bit.


I dragged one of the files (a tarball) something like 70...100 pixels up and to the right. I dropped it. The file disappeared!


I was still not realizing the fact, probably I thought it will show up after a refresh. So I started dragging another file (a script) something like 70...100 pixels up, and then I dropped it. This file disappeared too!


Now I started to believe in shit happening to me.


There was no recycle bin around, nor another folder in the proximity of the dropping area. I looked for the second file with Nautilus. I even used the search function. I then fired a terminal and searched for it. It was nowhere.


For privacy reasons, let's say the file name was "myscript-0.05". Here's what find was getting:

jujube:~# find / -name myscript-0.05
/home/radu/Bureau/myscript-0.05

I should add that this home partition was once holding an installation in French, hence the "Bureau" thing. Then I only used English-language installations, and "Desktop" is a symlink to "Bureau":

radu@jujube:~$ ls -l
total 26852
drwxr-xr-x 6 radu radu 4096 2008-06-01 22:31 Bureau
lrwxrwxrwx 1 radu radu 7 2008-03-30 19:45 Desktop -> Bureau/
[...]

So... there is a file with that name on my desktop, right? ("Bureau" or "Desktop", it's the same.) Where is it then?!

jujube:~# ls -l /home/radu/Bureau/*0.05
ls: cannot access /home/radu/Bureau/myscript-0.05: No such file or directory
jujube:~#
jujube:~# ls -l /home/radu/Desktop/*0.05
ls: cannot access /home/radu/Desktop/myscript-0.05: No such file or directory
jujube:~#

What?!


I needed a fsck, I needed it badly, and I needed to reboot.


The filesystem check complained about a zero dtime. It then rebooted and it started to recover the journal, just to complain about an orphaned inode, a few "illegal inodes" in "orphaned inode lists", and a few entries having "deleted inodes". The errors were fixed, I was told, no confirmation required.


That was easy, I thought.


Back into a running X system, I looked for the said script (I was also looking for the first file, but as I didn't remember its name, I was just looking for a spurious tarball).


None of the two files were back! They were actually "properly deleted", although I was never deleting them! Furthermore, a text file I was writing with vi just before the incident, and that I was saving just before the reboot... it was lost too!


I never experienced such a shit. No, I didn't lose important data. I didn't lose a partition. Heck, I lost 2 files, plus an extra text file! It wasn't about the magnitude of the loss, it was about HOW I lost them!


Please, don't tell me this is "because you were using Debian testing". Don't be that stupid, dear reader. Debian testing is much more stable than many "released" distros. And I wasn't installing anything — no updates were applied in the last week: I just checked what was available as updates and I decided I don't need them, as it wasn't anything of importance.


Let me put it again, maybe you missed it: I dragged and dropped two files on the desktop, for a short distance; I dropped them where I could see them; and they were deleted!


They were actually so poorly deleted, that the filesystem was corrupted, and it needed a fsck on boot.


Who is to blame? Should I blame Nautilus 2.20? Should I blame libgnomevfs2 2.20? Should I blame ext3? Should I blame the kernel?


Note that it was about GNOME 2.20, so it used the "classical" GnomeVFS — something that should be rock-solid, tested and stable. (What to expect from the new GVFS in GNOME 2.22 then?) And Nautilus is a core GNOME component too. It should just work, right?


But maybe it wasn't about nothing GNOME. Oh, but ext3 isn't the Linux-specific journaling filesystem? Isn't it the most trustworthy choice while in Linux? (It turns out that it's not the case, should we think of the fsync(2)/fdatasync(2) bug revealed by Firefox 3).


As for the kernel... we're now after 15 years of Linux, right? We can assume it's a mature kernel, right?


I still believe it's a Nautilus bug. After all, I wasn't trying to rename, move in the filesystem hierarchy, or delete the file. I was just moving it a little on the desktop!


OK, but how do I report this bug? And how could anyone reproduce it? It's more like a close encounter with an alien civilization!


This is, my friends, the shitty state of Linux, after more than 15 full years of life. This is also the most idiot bug I encountered in my almost 14 years of dealing with Linux, and about 22 years with computers.


To put a cherry on the cake, the system simply crashed with a black screen, all of a sudden, after 4 hours of moderate usage: OpenOffice.org, Iceweasel and Rhythmbox! I had to press the OFF button for 5 seconds, then to start it over.


2 more hours later, I hibernated it, as I didn't want to do anything more. Now I am scared to start it again...


No, the "testing" in the name doesn't mean it should be doing that. And it's not running the latest GNOME 2.22 plus gazillions of new system alpha and beta features, as it's not Fedora 9. It's Debian Lenny and it runs GNOME 2.20.


With such a crap, tell me again that you need spinning cubes or shiny plasmoids, and I'll chop your head with the first axe I'll see.


And I am calm now.


No, Linux is not ready for ANYTHING, even less for the desktop. We have no valid operating systems under the sun. We have playing consoles and jackpot machines, but we prefer to call them "computers".

UPDATE! Please read the comment by Gordon Messmer (I guess it's #31). It makes a lot of sense! (Still, there is no proof of a hardware malfunction.)

NOTE (June 4): I noticed a whole bunch of visits from Linux Today. So far, I have no problem with this. Except that the dozen of comments (talkbacks) on LT are mostly offensive, and abusing me. They range from Steve Stites questioning my style and declaring it "hysterical FUD", to various way of questioning my IT knowledge: for instance here, and more elaborately here.

The list of those 8 (0 to 7) items from the last quoted talkback is made in the contempt of all the common sense. To answer to some other comments, of course I have experienced much more severe issues in all these years, even the dumbest thing ever, that was to "recover" a mirrored HDD by overwriting it with the data from the broken HDD, but for most of the issues I had, I could blame Windows, MS-DOS, a hardware controller, a faulty driver, the user between the chair and the desk, or whatever else. But the problem with this issue I posted about is that it seemed to have no logical explanation whatsoever! (And I don't believe in magic.) Right now, Gordon Messmer provides with a very possible scenario, but from the user standpoint, dragging a little a couple of icons just to see them vanishing makes no sense!

Back to the list of 8 items: the writer is either idiot, or evil and idiot. Of course I had to reboot, simply because (how fsck is putting it): "WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage." Then, how on Earth can anyone recover the "list of processes at crash time"? It's not saved in any system log! Also, there was no crash dump, because it wasn't any SIGSEGV, it was a system freeze that required a long push of the ON/OFF switch! (Once again: is he idiot, or evil and idiot?)

All the "required" data is irrelevant, as I couldn't reproduce this bug anymore, hence it wasn't about filling a report on some component — how can you tell it was about Nautilus or not? It wasn't about Debian testing being "testing" or "unstable", it was about the whole lack of sense of the whole happening! It was the exact Debian testing collection of software (with the latest official kernel build) I was running for more than one week without applying the minor patches that were available for system components that were installed, but that I was not actually using (so I ignored the few patches).

Because of the dumb aggressiveness of some of the comments by readers coming from Linux Today, I already had to delete a few ones. They're accusing me of FUD and of being paid by Microsoft, but I bet some of them are using a Linux flavor by Novell... which is indeed paid by Microsoft!

Original link: http://beranger.org/index.php?pa...