[Lustre-discuss] Node randomly panic

Steden Klaus Klaus.Steden at thomson.net
Mon Nov 26 17:05:18 PST 2007


I am using netdump for this purpose, but I find that I don't always get complete core images on crash; tweaking the wait time before reboot doesn't seem to have the desired effect of allowing the complete core image to transfer, so YMMV.
 
I don't think there are any debugging symbols in the pre-built kernels, either, so you'd have to compile a debugging kernel version in order to dissect crashes.
 
hth,
Klaus

________________________________

From: lustre-discuss-bounces at clusterfs.com on behalf of Johann Lombardi
Sent: Mon 11/26/2007 7:37 AM
To: Somsak Sriprayoonsakul
Cc: lustre-discuss at clusterfs.com
Subject: Re: [Lustre-discuss] Node randomly panic



On Mon, Nov 26, 2007 at 08:49:56PM +0700, Somsak Sriprayoonsakul wrote:
> Could you tell me how to dump the whole crash log to file? It's not
> appear in /var/log/messages. I only seen it once actually. That's why I
> don't know the function name :) But the whole screen are something
> related to lustre for sure.

You should set up serial consoles (or netconsole). A crash dump utility
(netdump, LKCD, ...) is also very useful.

> Note that, the dump log is longer than a screen size, so taking photo
> wouldn't help ( I think ).

If /proc/sys/kernel/panic_on_oops is set to 1 on the OSS, you could try to set
it to 0 and to log onto the node to get the stack trace via dmesg before
rebooting it.

Johann

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss





More information about the lustre-discuss mailing list