[Lustre-discuss] Possible out of memory condition
Craig Tierney
Craig.Tierney at noaa.gov
Mon Oct 27 12:51:13 PDT 2008
Andreas Dilger wrote:
> On Oct 27, 2008 10:16 -0600, Craig Tierney wrote:
>> Andreas Dilger wrote:
>>> Note that soft lockups are only a warning. It shouldn't mean that the
>>> node is completely dead, only that some thread was hogging the CPU.
>> The two soft lockup messages (one in kswapd0 and the other in the user
>> process convert_emiss) repeated their messages for 6 hours before I rebooted
>> the node. I don't recall if I could login to the node or not.
>
> Ah, then the spewing of the "warning" messages is likely what caused the
> node to be unusable :-(. Console messages are printed with all interrupts
> disabled and can be a problem in such cases. Unfortunately, this printing
> is outside of the Lustre code so we can't fix it without patching the kernel.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
Thanks for your responses. This last one is very informative and I
can look for solutions to patch the kernel in the event the problem
reoccurs.
Thanks,
Craig
--
Craig Tierney (craig.tierney at noaa.gov)
More information about the lustre-discuss
mailing list