[Lustre-discuss] OST crashed after slow journal messages
Andreas Dilger
adilger at sun.com
Thu Dec 31 13:52:33 PST 2009
On 2009-12-30, at 08:44, Erik Froese wrote:
> I had an OST crash (actually it made the entire OSS unresponsive to
> the point where I had to shoot it). There were messages in /var/log/
> messages complaining about slow journal performance (we have
> separate OSTs and journal disks).
>
> Dec 28 20:29:02 oss-0-0 kernel: LustreError:filter_commitrw_write())
> scratch-OST000e: slow direct_io 85s
> Dec 28 20:29:02 oss-0-0 kernel: LustreError:filter_commitrw_write())
> Skipped 58 previous similar messages
> Dec 28 21:50:13 oss-0-0 kernel: LustreError:fsfilt_commit_wait())
> scratch-OST000e: slow journal start 51s
> Dec 28 21:50:13 oss-0-0 kernel: LustreError:fsfilt_commit_wait())
> Skipped 66 previous similar messages
These are usually a sign that the back-end storage is overloaded, or
somehow
performing very slowly. Maybe there was a RAID rebuild going on?
> Lustre and e2fsprogs versions:
>
> [root at oss-0-0 ~]# rpm -q kernel-lustre
> kernel-lustre-2.6.18-128.7.1.el5_lustre.1.8.1.1
> [root at oss-0-0 ~]# rpm -q e2fsprogs
> e2fsprogs-1.41.6.sun1-0redhat
>
>
> Then there's this interesting message:
> Dec 29 14:11:32 oss-0-0 kernel: LDISKFS-fs error (device sdz):
> ldiskfs_lookup: unlinked inode 5384166 in dir #145170469
> Dec 29 14:11:32 oss-0-0 kernel: Remounting filesystem read-only
This means the ldiskfs code found some corruption on disk, and remounted
the filesystem read-only to avoid further corruptions on disk.
> Whenever I try to mount the ost (known as /dev/dsk/ost24) I get the
> following messages:
> Dec 29 19:25:35 oss-0-0 kernel: LDISKFS-fs error (device sdz):
> ldiskfs_check_descriptors: Checksum for group 16303 failed (64812!=44)
> Dec 29 19:25:35 oss-0-0 kernel: LDISKFS-fs: group descriptors
> corrupted!
> So it looks like the "group descriptors" are corrupted. I'm not sure
> what those are but e2fsck -n sure enough complains about them. So I
> tried running it for real.
>
> I ran e2fsck -j /dev/$JOURNAL -v -fy -C 0 /dev/$DEVICE.
>
> The first time I ran to what looked like completion. It printed a
> summary and all but then didn't exit. I sent it a kill but that
> didn't stop it. So I let it run and went back to sleep for 3 hours.
> When I woke up the process was gone but I still get the same error
> messages.
Having a log of the e2fsck errors would be helpful.
> I found this discussion http://lists.lustre.org/pipermail/lustre-discuss/2009-March/009885.html
> and tried the tune2fs command followed by the e2fsck but it hasn't
> exited yet (its a 2.7 TB OST)
It might take an hour or two, depending on how fast your storage is.
> The LUN comes from a Sun STK 6140/CSM200 device which isn't
> reporting any warning, events, or errors.
>
> I deactivated the OST with lctl but it still shows up as active on
> the clients. Also lfs find /scratch -O scratch-OST000e_UUID HANGS!
You also need to deactivate it on the clients, at which point they will
get an IO error when accessing files on that OST.
> Are we screwed here? Is there a way to run lfs find with the OST
> disabled? Shouldn't that just be a metadata operation?
The size of a file is stored on the OSTs, so it depends on what you
are trying to do. "lfs getstripe" can be run with a deactivated OST.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list