[Lustre-discuss] OST crashed after slow journal messages

Fri Jan 1 22:13:22 PST 2010

On 2010-01-01, at 17:20, Erik Froese wrote:
> On Thu, Dec 31, 2009 at 4:52 PM, Andreas Dilger <adilger at sun.com>  
> wrote:
>> These are usually a sign that the back-end storage is overloaded,  
>> or somehow performing very slowly.  Maybe there was a RAID rebuild  
>> going on?
>
> I don't see any hw failures or rebuild messages on the RAID. I do  
> see periodic media scans going on.

There definitely was corruption of the directories on the OST,  
consistent with what the kernel originally reported, but it looks like  
it was resolved with a minimum of further problems.  You probably want  
to run the "ll_recover_lost_found_objs" on this OST to restore the  
objects from lost+found back to their correct locations, though that  
isn't strictly necessary to bring the OST back online - it just avoids  
losing access to the objects that were moved to lost+found due to the  
directory corruption.

> Lustre and e2fsprogs versions:
>
> [root at oss-0-0 ~]# rpm -q kernel-lustre
> kernel-lustre-2.6.18-128.7.1.el5_lustre.1.8.1.1
> [root at oss-0-0 ~]# rpm -q e2fsprogs
> e2fsprogs-1.41.6.sun1-0redhat
>
>
> Then there's this interesting message:
> Dec 29 14:11:32 oss-0-0 kernel: LDISKFS-fs error (device sdz):  
> ldiskfs_lookup: unlinked inode 5384166 in dir #145170469
> Dec 29 14:11:32 oss-0-0 kernel: Remounting filesystem read-only
>
> This means the ldiskfs code found some corruption on disk, and  
> remounted
> the filesystem read-only to avoid further corruptions on disk.
>
>
> Whenever I try to mount the ost (known as /dev/dsk/ost24) I get the  
> following messages:
> Dec 29 19:25:35 oss-0-0 kernel: LDISKFS-fs error (device sdz):  
> ldiskfs_check_descriptors: Checksum for group 16303 failed (64812!=44)
> Dec 29 19:25:35 oss-0-0 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> So it looks like the "group descriptors" are corrupted. I'm not sure  
> what those are but e2fsck -n sure enough complains about them. So I  
> tried running it for real.
>
> I ran e2fsck -j /dev/$JOURNAL -v -fy -C 0 /dev/$DEVICE.
>
> The first time I ran to what looked like completion. It printed a  
> summary and all but then didn't exit. I sent it a kill but that  
> didn't stop it. So I let it run and went back to sleep for 3 hours.  
> When I woke up the process was gone but I still get the same error  
> messages.
>
> Having a log of the e2fsck errors would be helpful.
>
> I don't have the log for the first fsck. First it logged a bunch of  
> messages about the group descriptors and that it was repairing them.  
> Then messages about inodes. I'm attaching the logs from the second  
> e2fsck.
>
>
>
>
>
> I found this discussion http://lists.lustre.org/pipermail/lustre-discuss/2009-March/009885.html
> and tried the tune2fs command followed by the e2fsck but it hasn't  
> exited yet (its a 2.7 TB OST)
>
> It might take an hour or two, depending on how fast your storage is.
>
>
> The LUN comes from a Sun STK 6140/CSM200 device which isn't  
> reporting any warning, events, or errors.
>
> I deactivated the OST with lctl but it still shows up as active on  
> the clients. Also lfs find /scratch -O scratch-OST000e_UUID HANGS!
>
> You also need to deactivate it on the clients, at which point they  
> will
> get an IO error when accessing files on that OST.
>
>  I didn't know that the you had to disable it on the clients as  
> well. Thanks
>
>
> Are we screwed here? Is there a way to run lfs find with the OST  
> disabled? Shouldn't that just be a metadata operation?
>
>
> The size of a file is stored on the OSTs, so it depends on what you  
> are trying to do.  "lfs getstripe" can be run with a deactivated OST.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
> <e2fsck-fy.sdz.out.post-mmp>

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.