[Lustre-discuss] Changing error behaviour to kernel panic.

Roy Dragseth roy.dragseth at uit.no
Mon Aug 31 01:56:25 PDT 2009


Hi.

We have a few problems with our storage hw where we get file systems corruption 
on some OSTs once in a while.  Lustre prefers to try to continue, but io-
operations to the OSTs in question fail making applications crash.  I would 
prefer to have a full hang instead of a partially working system as a reboot 
is needed anyway to fix the problem.  The tune2fs manual says this can be done 
using the -e flag on a per device basis making the kernel panic on errors.

So, my question is:  Will this have any severe side-effects that I'm not aware 
of?  Any other alternatives to this approach?

System specs:
CentOS 5.2 / Lustre 1.6.7.1
(we're upgrading to lustre 1.8.X in a few weeks.)


Just for the record, here are an example of the error message we get in dmesg:

LustreError: 16008:0:(filter_io_26.c:721:filter_commitrw_write()) error starting 
transaction: rc = -30

Any hints are greatly appreciated.

Regards,
r.

-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
	      phone:+47 77 64 41 07, fax:+47 77 64 41 00
        Roy Dragseth, Team Leader, High Performance Computing
	 Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no



More information about the lustre-discuss mailing list