[Lustre-discuss] OST crash with group descriptors corrupted

Brian J. Murrell Brian.Murrell at Sun.COM
Mon Mar 9 11:13:15 PDT 2009


On Mon, 2009-03-09 at 19:39 +0800, thhsieh wrote:
> Dear All,
> 
> We have an emergent condition on the Lustre filesystem.
> 
> But today
> we encounter the disk array hardware problem (one of the hard disk
> of the disk array RAID 6 crashed), and soon after that the lustre
> filesystem got crashed, too.

> The dmesg message shows:
> 
> [ 3314.530762] LDISKFS-fs error (device sdb1): ldiskfs_check_descriptors: Block bitmap for group 11152 not in group (block 3407085568)!
> [ 3314.531701] LDISKFS-fs: group descriptors corrupted!

It looks like your disk error has resulted on an on-disk corruption.
AFAIK, RAID is supposed to prevent this.  No idea why it didn't in this
case.  Maybe check with your RAID vendor.

> It seems that the backend ext3 file system is still there, but has
> errors.

Indeed.

> Could anyone suggest me a way to recover the OST partitions? Can I use
> e2fsck to fix the problems of the OST partitions?

Yes, e2fsck should correct the problem(s).  Be aware that there is a
possibility that the only way for e2fsck to correct the state of the
filesystem is to (re-)move data from the filesystem.  To what extent,
will depend completely on how much on-disk corruption has taken place.

You can get an idea of what e2fsck will do without actually doing
anything to the disk data by giving it the "-n" argument.  You can
decide based on that "dry-run" e2fsck output whether the corrective
action it will take is acceptable to you.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090309/f8818307/attachment.pgp>


More information about the lustre-discuss mailing list