[Lustre-discuss] Possible out of memory condition

Mon Oct 27 09:16:39 PDT 2008

Andreas Dilger wrote:
> On Oct 22, 2008  14:37 -0600, Craig Tierney wrote:
>> I just had two nodes hang with the following soft lockup messages.
>> I am running Centos 5.2 (2.6.18-93.1.13.el5) with the patchless client
>> (1.6.5.1).  My nodes do not have swap configured on them (no local
>> disks).   We do have a tool that looks for out of memory condition
>> and neither of the nodes in question reported a problem (not that it
>> is perfect).
> 
> Note that soft lockups are only a warning.  It shouldn't mean that the
> node is completely dead, only that some thread was hogging the CPU.
> 

The two soft lockup messages (one in kswapd0 and the other in the user
process convert_emiss) repeated their messages for 6 hours before I rebooted
the node.  I don't recall if I could login to the node or not.

>> Does the problem look like an issue with Lustre?
> 
> Lots of Lustre functions on the stack...
> 
>> Oct 22 08:06:45 h53 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [kswapd0:418]
>> Oct 22 08:06:45 h53 kernel: Call Trace:
>> Oct 22 08:06:45 h53 kernel:  [<ffffffff8871125a>] :osc:cache_remove_extent+0x4a/0x90
>> Oct 22 08:06:45 h53 kernel:  [<ffffffff88707c5a>] :osc:osc_teardown_async_page+0x25a/0x3c0
> 
> Do you have particularly large files in use (e.g. in the realm of 1TB or
> more)?  It seems possible that if there are a lot of pages to be cleaned
> up that this might cause a report like this.
> 

My first guess would be no, we don't create files that large.  But it is
entirely possible a user did something wrong with this code which caused
some large files (append vs. create).  I will check it out.

Thanks,
Craig

> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 

-- 
Craig Tierney (craig.tierney at noaa.gov)