[Lustre-discuss] Lustre 1.6.7 kernel panics when umounting MDTs

Mon Apr 6 19:17:52 PDT 2009

Thank you all for the feedback regarding the "inode out of bounds" issue,
and I look forward to the upcoming binary patch / release.

We have found some other interesting filesystem issues (probably related to
the bug under discussion) whereby we have some directories full of files
that have no inodes associated.  There are no recent logs indicating
problems so I assume that they are a result of the previous "inode out of
bounds" issue and were not fixed correctly with the fsck that was
subsequently done.

>From ls -l:

?--------- ? ? ? ?            ? uniprot_trembl_SwissGeneSynonym_99.lc
?--------- ? ? ? ?            ? uniprot_trembl_typ__168.inx
?--------- ? ? ? ?            ? uniprot_trembl_typ__171.inx
?--------- ? ? ? ?            ? uniprot_trembl_typ__173.inx
?--------- ? ? ? ?            ? uniref100_cks.inx

# rm uniref100_cks.inx
rm: cannot lstat `uniref100_cks.inx': No such file or directory

** The other issue I was wondering about in my initial post was whether a
kernel panic was expected if I simply umount an MDT on the lustre server
(whether or not clients have the file system mounted).   If not what, what
steps should I take to prevent this from occurring in the future?

Thanks again for the advice and responses received.

Marcus.

Marcus Schull,
Systems Administrator
IMB, University of Queensland

On 2/04/09 3:58 PM, "Jason Rappleye" <jason.rappleye at nasa.gov> wrote:

> 
> Hi,
> 
> On Apr 1, 2009, at 7:21 PM, Marcus Schull wrote:
> 
>> Lustre Admins,
>> 
>> 
>> We have recently had 2 odd events I am keen to get some feedback on.
>> 
>> 1) We recently had a filesystem go read-only (noticed on the client)
>> due to an "inode out of bounds" error as reported in the server logs.
>> Please see below for log and fsck details.
> 
> It looks like you're experiencing corruption described in bz 18695,
> which was also reported by us in bz 18889 and by TACC in bz 19029.
> 
> At this point in time, Sun hasn't made any official recommendations
> regarding this problem. If you can stomach it, I'm certain that Sun
> (and other sites experiencing this problem) would appreciate feedback
> on what the user whose data was corrupted was doing at the time. If
> you can reproduce the problem, even better.
> 
> Unfortunately, by the time ldiskfs detects the corruption, it's
> already happened - it is detecting corrupted metadata that has already
> been written to disk, not the instant at which the corruption is
> occurring. When we first discovered the corruption at our site, we had
> a user report errors when trying to perform ls and rm commands in a
> corrupted directory. ldiskfs did not report any errors. unmounting the
> MDT and running e2fsck on it produced the same type of errors you saw
> during your fsck. We remounted the MDT and later that day ldiskfs did
> produce the "inode out of bounds" error that you saw. So, either the
> corruption occurred again, or e2fsck didn't fix the problem.
> 
> This happened a day after upgrading to Lustre 1.6.7. We have since
> downgraded our servers to 1.6.6. We're running SLES10 SP2 with kernel
> 2.6.16.60-0.35 + a few site specific patches for OOM handling
> (developed in house in conjunction with SGI) and one for improved SCSI
> error handling that came from Novell, through SGI. So, this problem
> doesn't appear to be specific to the RHEL or SLES kernel, or to our
> patched kernel.
> 
> It's worth noting that we only saw this on the MDTs on a pair of
> filesystems. We ran e2fsck against all of the OSTs (a total of 90) and
> all came back clean.
> 
> Is there anyone else out there running 1.6.7 that is seeing this
> problem as well? It might be a good idea to unmount your MDTs and run
> e2fsck against them, and report the results back to the mailing list.
> 
> j
> 
>> 
>> 
>> 2) In order to correct the issue (ie remount the device read-write), I
>> attempted to umount that device (intending to do a quick fsck as the
>> device was the filesystem's MDT, and then remounting) - however this
>> action caused a kernel panic.
>> 
>> I have experienced these kernel panics quite a few times in the past
>> when umounting MDTs and OSTs in similar situations - when one or more
>> have gone read-only, but this is the first with the 1.6.7 kernel.
>> The resulting panic required a reboot and fsck on all mounted LUNs -
>> which at present is about 15 TB (in a few separate filesystems).
>> 
>> The server is currently acting has the MGS, MDT and OST partitions for
>> all lustre filesystems.  Unfortunately we have still not had the
>> chance to separate those roles onto different servers.
>> The server is running 64 bit RHEL 5.2 with the latest lustre kernel
>> (1.6.7) and associated packages.  It runs on a SUN blade with 4 AMD
>> cores and 16 GB RAM.
>> [ kernel version: 2.6.18-92.1.17.el5_lustre.1.6.7smp #1 SMP Mon Feb 9
>> 19:56:55 MST 2009 x86_64 x86_64 x86_64 GNU/Linux ]
>> 
>> All clients are also running RHEL 5.2/5.3 with unpatched kernels and
>> the latest lustre packages (1.6.7).
>> [ client kernel: 2.6.18-92.1.17.el5 #1 SMP Wed Oct 22 04:19:38 EDT
>> 2008 x86_64 x86_64 x86_64 GNU/Linux ]
>> 
>> 
>> I look forward to any advice regarding the unusual error (1) or any
>> procedures to follow in order to prevent the kernel panics.
>> 
>> 
>> Thanks in advance.
>> 
>> Marcus Schull,
>> Systems Administrator
>> IMB, University of Queensland
>> 
>> 
>> 
>> 
>> 
>> 
>> Lustre server /var/log/messages:
>> 
>> Apr  1 19:03:04 lustre1 kernel: LDISKFS-fs error (device dm-37):
>> ldiskfs_add_entry: bad entry in directory #12686675: inode out of
>> bounds - offset=0, inode=2807365078, rec_len=4096, name_len=36
>> Apr  1 19:03:04 lustre1 kernel: Remounting filesystem read-only
>> Apr  1 19:03:04 lustre1 kernel: LDISKFS-fs error (device dm-37) in
>> start_transaction: Readonly filesystem
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 8059:0:(fsfilt-ldiskfs.c:
>> 1231:fsfilt_ldiskfs_write_record()) can't start transaction for 18
>> blocks (128 bytes)
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 8059:0:(mds_reint.c:
>> 230:mds_finish_transno()) @@@ wrote trans #0 rc -5 client
>> 5142a23c-5adc-d4e8-4375-1b1028fe8e7d at idx 0: err = -30
>> req at ffff8101771ba000 x28445861/t0 o101->51
>> 42a23c-5adc-d4e8-4375-1b1028fe8e7d at NET_0x200008266732c_UUID:0/0 lens
>> 512/568 e 0 to 0 dl 1238576684 ref 1 fl Interpret:/0/0 rc 0/0
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 8059:0:(mds_reint.c:
>> 238:mds_finish_transno()) wrote objids: err = 0
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 11330:0:(fsfilt-
>> ldiskfs.c:
>> 280:fsfilt_ldiskfs_start()) error starting handle for op 8 (33
>> credits): rc -30
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 11330:0:(fsfilt-
>> ldiskfs.c:
>> 280:fsfilt_ldiskfs_start()) error starting handle for op 8 (33
>> credits): rc -30
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 11330:0:(mds_reint.c:
>> 154:mds_finish_transno()) fsfilt_start: -30
>> Apr  1 19:03:04 lustre1 kernel: LustreError: 11337:0:(mds_reint.c:
>> 154:mds_finish_transno()) fsfilt_start: -30
>> 
>> 
>> 
>> fsck output of filesystem with error reported above:
>> 
>> [root at lustre1 ~]# e2fsck /dev/mapper/mpath11p1
>> e2fsck 1.40.11.sun1 (17-June-2008)
>> qfab-MDT0000: recovering journal
>> qfab-MDT0000 contains a file system with errors, check forced.
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 12686675, i_size is 28672, should be 65536.  Fix<y>? yes
>> 
>> Pass 2: Checking directory structure
>> Problem in HTREE directory inode 12686675: node (0) has an unordered
>> hash table
>> Clear HTree index<y>? yes
>> 
>> Entry '(=fM-'^\^@ ^@M-~^JM-XM-'p^H$^@:{M-XM-'^H^G$^@TM-O^IM-(x^L(^@t^W
>> $M-(' in /ROOT/data1/offindex (12686675) has invalid inode #:
>> 2807365078.
>> Clear<y>? yes
>> 
>> Pass 3: Checking directory connectivity
>> Pass 3A: Optimizing directories
>> Pass 4: Checking reference counts
>> Unattached inode 12681396
>> Connect to /lost+found<y>? yes
>> 
>> Inode 12681396 ref count is 2, should be 1.  Fix<y>? yes
>> 
>> Unattached inode 12681397
>> Connect to /lost+found<y>? yes
>> 
>> ... etc
>> 
>> 
>> 
>> 
>> -------
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>