[Lustre-discuss] OSS crash after LDISKFS-fs error

Fri Nov 9 12:33:52 PST 2007

Hi,

My lustre environment is: 2.6.9-55.0.9.EL_lustre.1.6.3smp

One of my OSS's crashed today. Below you can see messages sent by it  
(storage09) to the syslog (first three lines). Then it died (my guess  
is with kernel panic) and heartbeat software STONITH that OSS's.

Nov  9 19:08:44 storage09.beowulf.cluster kernel: LDISKFS-fs error  
(device dm-5): mb_free_blocks: double-free of inode 38887437's block  
155560192(bit 10496 in group 4747)
Nov  9 19:08:44 storage09.beowulf.cluster kernel:  Nov  9 19:08:44  
storage09.beowulf.cluster kernel: Remounting filesystem read-only
Nov  9 19:08:44 storage09.beowulf.cluster kernel: LDISKFS-fs error  
(device dm-5): mb_free_blocks: double-free of inode 38887437's block  
155560193(bit 10497 in group 4747)
Nov  9 19:09:13 storage10.beowulf.cluster heartbeat: [21231]: WARN:  
node storage09: is dead Nov  9 19:09:13 storage10.beowulf.cluster  
heartbeat: [21231]: info: Link storage09:eth0 dead.
Nov  9 19:09:13 storage10.beowulf.cluster heartbeat: [21231]: info:  
Link storage09:eth2 dead. Nov  9 19:09:13 storage10.beowulf.cluster  
heartbeat: [32414]: info: Resetting node storage09 with [external/ipmi ]

Do you know how serious are LDISKFS-fs errors? Is that indicates data  
corruption on the certain block device? Device dm-5 is a DDN LUN.   
DDN controller S2A9500 says that everything is Healthy there.

Cheers

Wojciech Turek

Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071109/4496c987/attachment.htm>