[Lustre-discuss] OST crash with group descriptors corrupted

Tue Mar 10 22:27:42 PDT 2009

Hello,

Thanks so much to Andreas, Megan, and Brian. Following the suggestions
of Andreas, now all the OSTs are recovered and functional.

Due to the large time pressure, and unfortunately I have to be out of
office and can only access limited networking, I cannot get the lustre
patched e2fsprogs-1.40.11.sun1 to work. So I still use e2fsprogs-1.41.4.

After doing

# tune2fs -O uninit_bg /dev/XXX
# e2fsck -fy /dev/XXX

but still I cannot run the "tunefs.lustre --writeback /dev/xxx". Kernel
message complained that "Missing journal". Therefore, I tried to run:

# tune2fs -j /dev/XXX

This time everything works !!!!!  :)

Now I make the system on-line for users to download their data. For
safe I guess I still need to run:

lfs find --obd {OST_UUID} /mount/point

or anything I need to do in order to ensure the consistancy of the
lustre filesystem ?

Please give me suggestions.

Thanks very much again.

T.H.Hsieh

On Tue, Mar 10, 2009 at 01:47:39PM -0600, Andreas Dilger wrote:
> On Mar 10, 2009  23:42 +0800, thhsieh wrote:
> > I am wondering that, whether it is possible to give up that
> > problematic OST, and only make the other OSTs active, so that
> > we can rescue part of the data files ?
> 
> Yes, this is always possible.  Just mout lustre as normal, and
> on all clients + MDS run "lctl set_param osc.*OST{number}*.active=0"
> so that they will return an EIO error instead of hanging and waiting
> for the failed OST to return.
> 
> That said, I don't think this is necessarily a fatal problem.
> 
> > Now we have totally 6 OSTs, and one of the OST has problem
> > that I have no idea to repair now. If now I only activate
> > the 5 OSTs, could I get back (at most) 5/6 of data files,
> > or I can just get back junk files (since the files are
> > divided into fragments and are distributed into all OSTs) ?
> 
> Lustre by default places each file on a single OST, so you
> should be able to get back 5/6 of your files.
> 
> > If only activates the 5 OSTs can get back some data files,
> > what's the procedure I could do ?
> 
> Use "lfs find --obd {OST_UUID} /mount/point" to find files that
> are on the failed OST.  Hmm, there isn't a way to specify the
> opposite, however "lfs find ! --obd {OST_UUID}", please file a
> bug for that, it is relatively easy to implement (or you could
> take a crack at it in lustre/utils/lfs.c::lfs_find()).
> 
> > Because I am under time pressure to recover the system.
> > Hence I am considering the worst situation ....
> 
> I think you can possibly recover this OST.
> 
> > > > You did that with or without "-n" in the command arguments?
> > > > 
> > > > > [80083.964462] LDISKFS-fs: group descriptors corrupted!
> > > > > [81423.119834] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors: Checksum for group 11165 failed (0!=20224)
> 
> It looks like this is a simple bug in the ldiskfs code AND in the e2fsck
> code.  The feature that enables group checksums (uninit_bg) was disabled
> in the superblock for some reason, but e2fsck didn't clear the checksum
> from disk.  Now, the kernel is returning "0" for the checksum (because
> this feature is disabled) but there is an old checksum value on disk.
> 
> The easiest way to fix this (short of modifying the kernel and/or e2fsck)
> is to re-enable the uninit_bg feature, and re-run e2fsck.  Note that
> running with uninit_bg is preferable in any case, as it improves performance.
> 
> # tune2fs -O uninit_bg /dev/XXX
> # e2fsck -fy /dev/XXX
> 
> This will report an error for all of the checksum values and correct
> them, but then hopefully your filesystem can be mounted again.  Please
> file a separate bug on this, it needs to be fixed in our uninit_bg code
> to ignore the checksum if the feature is disabled, and in e2fsck to zero
> this value if the feature is disabled.
> 
> > > > > We did the e2fsck (version 1.41.4) on all the OST partitions.
> 
> Note that e2fsck 1.41.4 is the upstream e2fsprogs, and not the Lustre-patched
> e2fsprogs-1.40.11.sun1.  While the majority of Lustre (now ext4) functionality
> is included into 1.41.4 it isn't all there.  In this case I don't know if it
> matters or not.
> 
> Also note that the "uninit_bg" feature was called "uninit_groups" in the
> 1.40.11 release of Lustre e2fsprogs (this was changed beyond our control),
> so adjust the above steps accordingly.
> 
> > > > Hrm.  I don't know enough about the innards of ext3 to parse that.
> > > > Maybe (well, no maybes about it) Andreas will know if he is reading.
> > > > 
> > > > > We tried e2fsck with superblock 32768, but after some error
> > > > > corrections again we encounter the same problem.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>