[Lustre-discuss] OST crash with group descriptors corrupted

Tue Mar 10 12:47:39 PDT 2009

On Mar 10, 2009  23:42 +0800, thhsieh wrote:
> I am wondering that, whether it is possible to give up that
> problematic OST, and only make the other OSTs active, so that
> we can rescue part of the data files ?

Yes, this is always possible.  Just mout lustre as normal, and
on all clients + MDS run "lctl set_param osc.*OST{number}*.active=0"
so that they will return an EIO error instead of hanging and waiting
for the failed OST to return.

That said, I don't think this is necessarily a fatal problem.

> Now we have totally 6 OSTs, and one of the OST has problem
> that I have no idea to repair now. If now I only activate
> the 5 OSTs, could I get back (at most) 5/6 of data files,
> or I can just get back junk files (since the files are
> divided into fragments and are distributed into all OSTs) ?

Lustre by default places each file on a single OST, so you
should be able to get back 5/6 of your files.

> If only activates the 5 OSTs can get back some data files,
> what's the procedure I could do ?

Use "lfs find --obd {OST_UUID} /mount/point" to find files that
are on the failed OST.  Hmm, there isn't a way to specify the
opposite, however "lfs find ! --obd {OST_UUID}", please file a
bug for that, it is relatively easy to implement (or you could
take a crack at it in lustre/utils/lfs.c::lfs_find()).

> Because I am under time pressure to recover the system.
> Hence I am considering the worst situation ....

I think you can possibly recover this OST.

> > > You did that with or without "-n" in the command arguments?
> > > 
> > > > [80083.964462] LDISKFS-fs: group descriptors corrupted!
> > > > [81423.119834] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors: Checksum for group 11165 failed (0!=20224)

It looks like this is a simple bug in the ldiskfs code AND in the e2fsck
code.  The feature that enables group checksums (uninit_bg) was disabled
in the superblock for some reason, but e2fsck didn't clear the checksum
from disk.  Now, the kernel is returning "0" for the checksum (because
this feature is disabled) but there is an old checksum value on disk.

The easiest way to fix this (short of modifying the kernel and/or e2fsck)
is to re-enable the uninit_bg feature, and re-run e2fsck.  Note that
running with uninit_bg is preferable in any case, as it improves performance.

# tune2fs -O uninit_bg /dev/XXX
# e2fsck -fy /dev/XXX

This will report an error for all of the checksum values and correct
them, but then hopefully your filesystem can be mounted again.  Please
file a separate bug on this, it needs to be fixed in our uninit_bg code
to ignore the checksum if the feature is disabled, and in e2fsck to zero
this value if the feature is disabled.

> > > > We did the e2fsck (version 1.41.4) on all the OST partitions.

Note that e2fsck 1.41.4 is the upstream e2fsprogs, and not the Lustre-patched
e2fsprogs-1.40.11.sun1.  While the majority of Lustre (now ext4) functionality
is included into 1.41.4 it isn't all there.  In this case I don't know if it
matters or not.

Also note that the "uninit_bg" feature was called "uninit_groups" in the
1.40.11 release of Lustre e2fsprogs (this was changed beyond our control),
so adjust the above steps accordingly.

> > > Hrm.  I don't know enough about the innards of ext3 to parse that.
> > > Maybe (well, no maybes about it) Andreas will know if he is reading.
> > > 
> > > > We tried e2fsck with superblock 32768, but after some error
> > > > corrections again we encounter the same problem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.