[Lustre-discuss] Node randomly panic
Somsak Sriprayoonsakul
somsak_sr at thaigrid.or.th
Mon Nov 26 19:33:08 PST 2007
Thanks for all comments. I'll try netdump and see how it goes. It'll
take a while but I'll be back.
Anyways, could someone answer my second and third questions?
- Is RECOVERING enough? Should we run e2fsck + lfsck every time Lustre
failed?
- Quota is turned off when *any* OSS node failed. Are there anyways to
have it "always on"?
BTW, when I turn quota back on, sometimes quota setting goes wrong, some
OSS has only 1 byte while the others have proper value. We work aroudn
this by reset the quota back to all-zero and set the quota again. Is
this normal?
Johann Lombardi wrote:
> On Mon, Nov 26, 2007 at 08:49:56PM +0700, Somsak Sriprayoonsakul wrote:
>
>> Could you tell me how to dump the whole crash log to file? It's not
>> appear in /var/log/messages. I only seen it once actually. That's why I
>> don't know the function name :) But the whole screen are something
>> related to lustre for sure.
>>
>
> You should set up serial consoles (or netconsole). A crash dump utility
> (netdump, LKCD, ...) is also very useful.
>
>
>> Note that, the dump log is longer than a screen size, so taking photo
>> wouldn't help ( I think ).
>>
>
> If /proc/sys/kernel/panic_on_oops is set to 1 on the OSS, you could try to set
> it to 0 and to log onto the node to get the stack trace via dmesg before
> rebooting it.
>
> Johann
>
>
--
-----------------------------------------------------------------------------------
Somsak Sriprayoonsakul
Thai National Grid Center
Software Industry Promotion Agency
Ministry of ICT, Thailand
somsak_sr at thaigrid.or.th
-----------------------------------------------------------------------------------
More information about the lustre-discuss
mailing list