[Lustre-discuss] Changing error behaviour to kernel panic.

Mon Aug 31 06:01:46 PDT 2009

On Mon, 2009-08-31 at 10:56 +0200, Roy Dragseth wrote:
> Hi.

Hi,

> We have a few problems with our storage hw where we get file systems corruption 
> on some OSTs once in a while.  Lustre prefers to try to continue, but io-
> operations to the OSTs in question fail making applications crash.  I would 
> prefer to have a full hang instead of a partially working system as a reboot 
> is needed anyway to fix the problem.  The tune2fs manual says this can be done 
> using the -e flag on a per device basis making the kernel panic on errors.

You are looking for is the "errors=panic" mount option.  It can be done
at mount time with "-o errors=panic" mount option or, I believe, it can
be set permanently in the device's configuration.

> So, my question is:  Will this have any severe side-effects that I'm not aware 
> of?

Well, if what you are looking for is a node to completely halt when a
corruption is detected, this will certainly do it.

> LustreError: 16008:0:(filter_io_26.c:721:filter_commitrw_write()) error starting 
> transaction: rc = -30

Yes, the event that made the backing-store read-only (-30) will instead
panic the node.  That gives something like HA on another node the
opportunity to start serving up the target.  This is only useful of
course if there is not really any corruption.  If the target really is
corrupted though, you really ought to fix that first.  You shouldn't
just go on, letting corruptions pile up.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090831/c7ced2b3/attachment.pgp>