[Lustre-discuss] kernel freeze

Andreas Dilger adilger at sun.com
Thu Mar 20 09:05:28 PDT 2008


On Mar 20, 2008  13:48 +0100, Papp Tam�s wrote:
> What could cause this error?
> Kernel: 2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd and 
> 2.6.9-55.0.9.EL_lustre.1.6.4.1smp (CentOS 4.4)
> 
> After the node freezed up, his failover pair took over the resource, but 
> it did it too.
> 
> I've just looked back in logs and I see, this header corrupted messages 
> some more times in the last few days.
> After I turned it on again, it freezed up in 10 minutes.
> 
> 
> Mar 20 10:57:19 node2 kernel: LDISKFS-fs: header is corrupted!
> Mar 20 10:57:19 node2 kernel: LDISKFS-fs: invalid magic = 0x281e
> Mar 20 10:57:19 node2 kernel: LDISKFS-fs: header is corrupted!

This means you have on-disk corruption and an "e2fsck -f" is needed
(while filesystem is unmounted of course).

> Mar 20 11:03:25 node2 kernel: ------------[ cut here ]------------
> Mar 20 11:03:25 node2 kernel: kernel BUG at 
> /usr/src/redhat/BUILD/lustre-1.6.0.1/lustre/ldiskfs/extents.c:1751!

You have quite an old version of lustre, and several ldiskfs bugs have
been fixed since then.  I don't think it will BUG() on finding disk
errors anymore.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list