[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1
Craig Prescott
prescott at hpc.ufl.edu
Tue Dec 1 12:56:51 PST 2009
Hope someone can help us out with this one.
We are running Lustre 1.8.1.1. One of our two OSS nodes (12 OSTs)
become unresponsive on Sunday night. We issued an IPMI power cycle.
After the node was back up, we tried to fsck the OSTs
(e2fsprogs-1.41.6.sun1-0redhat.x86_64) with 'fsck -f -y'. Eleven of the
twelve OSTs fsck'd normally. The 12th OST showed heavy corruption, with
many inodes moved to /lost+found. This fsck never finished, and we
killed it after ~14 hours.
All further fsck attempts seem to endlessly get kicked back to pass 1
after many zero dtime corrections, and relocating many group block
bitmaps, inode bitmaps, and inode tables. It seems that many of these
changes are never written out to the filesystem, as we encounter the
same corrections on subsequent pass 1 restarts. Actually, it looks like
every *other* attempt to run pass 1 yields similar output, as if fsck is
bouncing back and forth between two solutions.
We have tried e2fsprogs 1.41.6.sun1-0redhat and 1.41.9 from sourceforge.
Logs (enormous) of the fsck attempts are available here:
http://hpc.ufl.edu/logs/fsck.log.1.41.9.gz (2 full pass 1 fsck attempts)
http://hpc.ufl.edu/logs/fsck.log.1.41.6.gz (4 full pass 1 fsck attempts)
Can any part of this OST be salvaged?
Thanks,
Craig Prescott
UF HPC Center
From the initial fsck:
fsck.ext4: Group descriptors look bad... trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? yes
*** ext3 journal has been deleted - filesystem is now ext2 only ***
Superblock has_journal flag is clear, but a journal inode is present.
Clear? yes
Pass 1: Checking inodes, blocks, and sizes
Journal inode is not in use, but contains data. Clear? yes
Inodes that were part of a corrupted orphan linked list found. Fix? yes
Inode 32784385 was part of the orphaned inode list. FIXED.
Inode 32784385 has imagic flag set. Clear? yes
...
File ??? (inode #114786307, mod time Fri Oct 10 14:03:48 2008)
has 506488 multiply-claimed block(s), shared with 7 file(s):
??? (inode #114786319, mod time Fri Oct 10 14:03:48 2008)
... (inode #114786317, mod time Fri Oct 10 14:03:48 2008)
... (inode #114786315, mod time Fri Oct 10 14:03:48 2008)
??? (inode #114786313, mod time Fri Oct 10 14:03:48 2008)
... (inode #114786311, mod time Fri Oct 10 14:03:48 2008)
... (inode #114786309, mod time Fri Oct 10 14:03:48 2008)
??? (inode #114786305, mod time Fri Oct 10 14:03:48 2008)
Clone multiply-claimed blocks? yes
...
More information about the lustre-discuss
mailing list