[Lustre-discuss] Changing error behaviour to kernel panic.
Roy Dragseth
roy.dragseth at uit.no
Mon Aug 31 01:56:25 PDT 2009
Hi.
We have a few problems with our storage hw where we get file systems corruption
on some OSTs once in a while. Lustre prefers to try to continue, but io-
operations to the OSTs in question fail making applications crash. I would
prefer to have a full hang instead of a partially working system as a reboot
is needed anyway to fix the problem. The tune2fs manual says this can be done
using the -e flag on a per device basis making the kernel panic on errors.
So, my question is: Will this have any severe side-effects that I'm not aware
of? Any other alternatives to this approach?
System specs:
CentOS 5.2 / Lustre 1.6.7.1
(we're upgrading to lustre 1.8.X in a few weeks.)
Just for the record, here are an example of the error message we get in dmesg:
LustreError: 16008:0:(filter_io_26.c:721:filter_commitrw_write()) error starting
transaction: rc = -30
Any hints are greatly appreciated.
Regards,
r.
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no
More information about the lustre-discuss
mailing list