[Lustre-discuss] OSS crash after LDISKFS-fs error
Wojciech Turek
wjt27 at cam.ac.uk
Fri Nov 9 12:33:52 PST 2007
Hi,
My lustre environment is: 2.6.9-55.0.9.EL_lustre.1.6.3smp
One of my OSS's crashed today. Below you can see messages sent by it
(storage09) to the syslog (first three lines). Then it died (my guess
is with kernel panic) and heartbeat software STONITH that OSS's.
Nov 9 19:08:44 storage09.beowulf.cluster kernel: LDISKFS-fs error
(device dm-5): mb_free_blocks: double-free of inode 38887437's block
155560192(bit 10496 in group 4747)
Nov 9 19:08:44 storage09.beowulf.cluster kernel: Nov 9 19:08:44
storage09.beowulf.cluster kernel: Remounting filesystem read-only
Nov 9 19:08:44 storage09.beowulf.cluster kernel: LDISKFS-fs error
(device dm-5): mb_free_blocks: double-free of inode 38887437's block
155560193(bit 10497 in group 4747)
Nov 9 19:09:13 storage10.beowulf.cluster heartbeat: [21231]: WARN:
node storage09: is dead Nov 9 19:09:13 storage10.beowulf.cluster
heartbeat: [21231]: info: Link storage09:eth0 dead.
Nov 9 19:09:13 storage10.beowulf.cluster heartbeat: [21231]: info:
Link storage09:eth2 dead. Nov 9 19:09:13 storage10.beowulf.cluster
heartbeat: [32414]: info: Resetting node storage09 with [external/ipmi ]
Do you know how serious are LDISKFS-fs errors? Is that indicates data
corruption on the certain block device? Device dm-5 is a DDN LUN.
DDN controller S2A9500 says that everything is Healthy there.
Cheers
Wojciech Turek
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071109/4496c987/attachment.htm>
More information about the lustre-discuss
mailing list