[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Tue Dec 1 15:50:18 PST 2009

On 2009-12-01, at 13:56, Craig Prescott wrote:
> We are running Lustre 1.8.1.1.  One of our two OSS nodes (12 OSTs)
> become unresponsive on Sunday night.  We issued an IPMI power cycle.
>
> After the node was back up, we tried to fsck the OSTs
> (e2fsprogs-1.41.6.sun1-0redhat.x86_64) with 'fsck -f -y'.  Eleven of  
> the twelve OSTs fsck'd normally.  The 12th OST showed heavy  
> corruption, with many inodes moved to /lost+found.  This fsck never  
> finished, and we killed it after ~14 hours.
>
> All further fsck attempts seem to endlessly get kicked back to pass 1
> after many zero dtime corrections, and relocating many group block
> bitmaps, inode bitmaps, and inode tables.  It seems that many of these
> changes are never written out to the filesystem, as we encounter the
> same corrections on subsequent pass 1 restarts.  Actually, it looks  
> like every *other* attempt to run pass 1 yields similar output, as  
> if fsck is bouncing back and forth between two solutions.
>
> We have tried e2fsprogs 1.41.6.sun1-0redhat and 1.41.9 from  
> sourceforge.
>   Logs (enormous) of the fsck attempts are available here:
>
> http://hpc.ufl.edu/logs/fsck.log.1.41.9.gz (2 full pass 1 fsck  
> attempts)
> http://hpc.ufl.edu/logs/fsck.log.1.41.6.gz (4 full pass 1 fsck  
> attempts)
>
> Can any part of this OST be salvaged?

It's possible, though I'm not sure how much will be left, after the  
volume of messages that I saw.

I would start by simply trying to mount the OST filesystem with  
ldiskfs directly (mount options "-o ro" to avoid any further  
corruption or errors, and possibly also "noload" to avoid recovering  
the journal), and seeing if you can copy out the data from the  
filesystem into a backup filesystem, and then just reformat the OST.

You should copy out the files with a tool that has xattr support, like  
rsync v3, or the RHEL tar using the --xattr option.

Failing that, you may be able to e2fsck using a backup superblock and  
group descriptor with the "-B 4096 -b {blocknr}", where:

blocknr = 32768 * {3,5,7}^n

I don't think the first backup group descriptor is valid (that would  
be n=0 above, or 32768), so you could try (at random) 32768 * 3^2 =  
294912.
If you can get it mounted at all you should copy the data out.  If you  
have a very new kernel you may be able to mount the filesystem with  
ext4 (so that you don't need to re-create the journal) to copy the  
data out.

For the objects in the lost+found directory ll_recover_lost_found_objs  
will "rescue" all of these objects and put them back into the right  
directory structure for Lustre to find them again.

> From the initial fsck:
>
> fsck.ext4: Group descriptors look bad... trying backup blocks...
> Superblock has an invalid journal (inode 8).
> Clear? yes
>
> *** ext3 journal has been deleted - filesystem is now ext2 only ***
>
> Superblock has_journal flag is clear, but a journal inode is present.
> Clear? yes
>
> Pass 1: Checking inodes, blocks, and sizes
> Journal inode is not in use, but contains data.  Clear? yes
>
>
> Inodes that were part of a corrupted orphan linked list found.  Fix?  
> yes
>
> Inode 32784385 was part of the orphaned inode list.  FIXED.
> Inode 32784385 has imagic flag set.  Clear? yes
>
> ...
>
> File ??? (inode #114786307, mod time Fri Oct 10 14:03:48 2008)
>   has 506488 multiply-claimed block(s), shared with 7 file(s):
>         ??? (inode #114786319, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786317, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786315, mod time Fri Oct 10 14:03:48 2008)
>         ??? (inode #114786313, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786311, mod time Fri Oct 10 14:03:48 2008)
>         ... (inode #114786309, mod time Fri Oct 10 14:03:48 2008)
>         ??? (inode #114786305, mod time Fri Oct 10 14:03:48 2008)
> Clone multiply-claimed blocks? yes
>
> ...
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.