[Lustre-discuss] Possible out of memory condition

Craig Tierney Craig.Tierney at noaa.gov
Mon Oct 27 12:51:13 PDT 2008


Andreas Dilger wrote:
> On Oct 27, 2008  10:16 -0600, Craig Tierney wrote:
>> Andreas Dilger wrote:
>>> Note that soft lockups are only a warning.  It shouldn't mean that the
>>> node is completely dead, only that some thread was hogging the CPU.
>> The two soft lockup messages (one in kswapd0 and the other in the user
>> process convert_emiss) repeated their messages for 6 hours before I rebooted
>> the node.  I don't recall if I could login to the node or not.
> 
> Ah, then the spewing of the "warning" messages is likely what caused the
> node to be unusable :-(.  Console messages are printed with all interrupts
> disabled and can be a problem in such cases.  Unfortunately, this printing
> is outside of the Lustre code so we can't fix it without patching the kernel.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 



Thanks for your responses.  This last one is very informative and I
can look for solutions to patch the kernel in the event the problem
reoccurs.

Thanks,
Craig


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the lustre-discuss mailing list