[Lustre-discuss] corrupted OSTs on server, advice needed

Samuel Aparicio saparicio at bccrc.ca
Tue Jul 3 18:09:02 PDT 2012

We had an OST server go down somewhat uncleanly and it appears we have problems with two of the OSTs.
we are running the maintenance release of lustre-2.1.1

one OST reports the following:
e2fsck -p -j /dev/etherd/e18.21p2 /dev/md141
lustre2-OST0006: Note: if several inode or block bitmap blocks or part
of the inode table require relocation, you may wish to try
running e2fsck with the '-b 32768' option first.  The problem
may lie only with the primary block group descriptors, and
the backup block group descriptors may be OK.

lustre2-OST0006: Block bitmap for group 960 is not in group.  (block 18446744073709551615)

	(i.e., without -a or -p options)

I notice in an old thread, something similar happened elsewhere and was recovered with
e2fsck -fp -b 32768 <device>
followed by e2fsck -fy <device>

would this be safe to do ? an alternative on that thread suggested by Andreas Dilger was to deleted the external journal, e2fsck and then re-add the journal afterwards.

The second OST reports the following:
e2fsck -p -j /dev/etherd/e18.21p6 /dev/md142
lustre2-OST0009: External journal does not support this filesystem

which is strange because this IS the external journal for this filesystem.
any idea on how to proceed with this one would be gratefully received.

