[Lustre-discuss] Node randomly panic

Somsak Sriprayoonsakul somsak_sr at thaigrid.or.th
Mon Nov 26 19:33:08 PST 2007


Thanks for all comments. I'll try netdump and see how it goes. It'll 
take a while but I'll be back.

Anyways, could someone answer my second and third questions?

- Is RECOVERING enough? Should we run e2fsck + lfsck every time Lustre 
failed?
 
- Quota is turned off  when *any* OSS node failed. Are there anyways to 
have it "always on"?

BTW, when I turn quota back on, sometimes quota setting goes wrong, some 
OSS has only 1 byte while the others have proper value. We work aroudn 
this by reset the quota back to all-zero and set the quota again. Is 
this normal?

Johann Lombardi wrote:
> On Mon, Nov 26, 2007 at 08:49:56PM +0700, Somsak Sriprayoonsakul wrote:
>   
>> Could you tell me how to dump the whole crash log to file? It's not 
>> appear in /var/log/messages. I only seen it once actually. That's why I 
>> don't know the function name :) But the whole screen are something 
>> related to lustre for sure.
>>     
>
> You should set up serial consoles (or netconsole). A crash dump utility
> (netdump, LKCD, ...) is also very useful.
>
>   
>> Note that, the dump log is longer than a screen size, so taking photo 
>> wouldn't help ( I think ).
>>     
>
> If /proc/sys/kernel/panic_on_oops is set to 1 on the OSS, you could try to set
> it to 0 and to log onto the node to get the stack trace via dmesg before
> rebooting it.
>
> Johann
>
>   
-- 

-----------------------------------------------------------------------------------
Somsak Sriprayoonsakul

Thai National Grid Center
Software Industry Promotion Agency
Ministry of ICT, Thailand
somsak_sr at thaigrid.or.th
-----------------------------------------------------------------------------------




More information about the lustre-discuss mailing list