[Lustre-discuss] Lustre Client - Memory Issue

Dmitry Zogin dmitry.zoguine at oracle.com
Mon Aug 30 18:50:33 PDT 2010


Actually there was a bug fixed in 1.8.4 when obdo structures can be 
allocated and freed outside of OBDO_ALLOC/OBDO_FREE macros. That could 
lead to the slab fragmentation and pseudo-leak.
The patch is in the attachment 30664 for bz 21980

Dmitry


Andreas Dilger wrote:
> On 2010-08-26, at 18:42, Jagga Soorma wrote:
>   
>> I am still running into this issue on some nodes:
>>
>> client109: ll_obdo_cache          0 152914489    208   19    1 : tunables  120   60    8 : slabdata      0 8048131      0
>> client102: ll_obdo_cache          0 308526883    208   19    1 : tunables  120   60    8 : slabdata      0 16238257      0
>>
>> How can I calculate how much memory this is holding on to.
>>     
>
> If you do "head -1 /proc/slabinfo" it reports the column descriptions.
>
> The "slabdata" will section reports numslabs=16238257, and pagesperslab=1, so tis is 16238257 pages of memory, or about 64GB of RAM on client102.  Ouch.
>
>   
>>  My system shows a lot of memory that is being used up but none of the jobs are using that much memory.  Also, these clients are running a smp sles 11 kernel but I can't find any /sys/kernel/slab directory.  
>>
>> Linux client102 2.6.27.29-0.1-default #1 SMP 2009-08-15 17:53:59 +0200 x86_64 x86_64 x86_64 GNU/Linux
>>
>> What makes you say that this does not look like a lustre memory leak?  I thought all the ll_* objects in slabinfo are lustre related?
>>     
>
> It's true that the ll_obdo_cache objects are allocated by Lustre, but the above data shows 0 of those objects in use, so the kernel _should_ be freeing the unused slab objects.  This particular data type (obdo) is only ever in use temporarily during system calls on the client, and should never be allocated for a long time.
>
> For some reason the kernel is not freeing the empty slab pages.  That is the responsibility of the kernel, and not Lustre.
>
>   
>>  To me it looks like lustre is holding on to this memory but I don't know much about lustre internals.
>>
>> Also, memused on these systems are:
>>
>> client102: 2353666940
>> client109: 2421645924
>>     
>
> This shows that Lustre is actively using about 2.4GB of memory allocations.  It is not tracking the 64GB of memory in the obdo_cache slab, because it has freed that memory (even though the kernel has not freed those pages).
>
>   
>> Any help would be greatly appreciated.
>>     
>
> The only suggestion I have is that if you unmount Lustre and unload the modules (lustre_rmmod) it will free up this memory.  Otherwise, searching for problems with the slab cache on this kernel may turn up something.
>
>   
>> On Wed, May 19, 2010 at 8:08 AM, Dmitry Zogin <dmitry.zoguine at oracle.com> wrote:
>> Hello Jagga,
>>
>> I checked the data, and indeed this does not look like a lustre memory leak, rather than a slab fragmentation, which assumes there might be a kernel issue here. From the slabinfo (I only keep three first columns here):
>>
>>
>> name            <active_objs> <num_objs>
>> ll_obdo_cache          0 452282156    208
>>
>> means that there are no active objects, but the memory pages are not released back from slab allocator to the free pool (the num value is huge). That looks like a slab fragmentation - you can get more description at 
>> http://kerneltrap.org/Linux/Slab_Defragmentation
>>
>> Checking your mails, I wonder if this only happens on clients which have  SLES11 installed? As the RAM size is around 192Gb, I assume they are NUMA systems?
>> If so, SLES11 has defrag_ratio tunables in /sys/kernel/slab/xxx
>> From the source of get_any_partial()
>>
>> #ifdef CONFIG_NUMA
>>
>>         /*
>>          * The defrag ratio allows a configuration of the tradeoffs between
>>          * inter node defragmentation and node local allocations. A lower
>>          * defrag_ratio increases the tendency to do local allocations
>>          * instead of attempting to obtain partial slabs from other nodes.
>>          *
>>          * If the defrag_ratio is set to 0 then kmalloc() always
>>          * returns node local objects. If the ratio is higher then kmalloc()
>>          * may return off node objects because partial slabs are obtained
>>          * from other nodes and filled up.
>>          *
>>          * If /sys/kernel/slab/xx/defrag_ratio is set to 100 (which makes
>>          * defrag_ratio = 1000) then every (well almost) allocation will
>>          * first attempt to defrag slab caches on other nodes. This means
>>          * scanning over all nodes to look for partial slabs which may be
>>          * expensive if we do it every time we are trying to find a slab
>>          * with available objects.
>>          */
>>
>> Could you please verify that your clients have defrag_ratio tunable and try to use various values?
>> It looks like the value of 100 should be the best, unless there is a bug, then may be even 0 gets the desired result?
>>
>> Best regards,
>> Dmitry
>>
>>
>> Jagga Soorma wrote:
>>     
>>> Hi Johann,
>>>
>>> I am actually using 1.8.1 and not 1.8.2:
>>>
>>> # rpm -qa | grep -i lustre
>>> lustre-client-1.8.1.1-2.6.27.29_0.1_lustre.1.8.1.1_default
>>> lustre-client-modules-1.8.1.1-2.6.27.29_0.1_lustre.1.8.1.1_default
>>>
>>> My kernel version on the SLES 11 clients is:
>>> # uname -r
>>> 2.6.27.29-0.1-default
>>>
>>> My kernel version on the RHEL 5.3 mds/oss servers is:
>>> # uname -r
>>> 2.6.18-128.7.1.el5_lustre.1.8.1.1
>>>
>>> Please let me know if you need any further information.  I am still trying to get the user to help me run his app so that I can run the leak finder script to capture more information.
>>>
>>> Regards,
>>> -Simran
>>>
>>> On Tue, Apr 27, 2010 at 7:20 AM, Johann Lombardi <johann at sun.com> wrote:
>>> Hi,
>>>
>>> On Tue, Apr 20, 2010 at 09:08:25AM -0700, Jagga Soorma wrote:
>>>       
>>>> Thanks for your response.* I will try to run the leak-finder script and
>>>> hopefully it will point us in the right direction.* This only seems to be
>>>> happening on some of my clients:
>>>>         
>>> Could you please tell us what kernel you use on the client side?
>>>
>>>       
>>>>    client104: ll_obdo_cache********* 0 433506280*** 208** 19*** 1 : tunables*
>>>>    120** 60*** 8 : slabdata***** 0 22816120***** 0
>>>>    client116: ll_obdo_cache********* 0 457366746*** 208** 19*** 1 : tunables*
>>>>    120** 60*** 8 : slabdata***** 0 24071934***** 0
>>>>    client113: ll_obdo_cache********* 0 456778867*** 208** 19*** 1 : tunables*
>>>>    120** 60*** 8 : slabdata***** 0 24040993***** 0
>>>>    client106: ll_obdo_cache********* 0 456372267*** 208** 19*** 1 : tunables*
>>>>    120** 60*** 8 : slabdata***** 0 24019593***** 0
>>>>    client115: ll_obdo_cache********* 0 449929310*** 208** 19*** 1 : tunables*
>>>>    120** 60*** 8 : slabdata***** 0 23680490***** 0
>>>>    client101: ll_obdo_cache********* 0 454318101*** 208** 19*** 1 : tunables*
>>>>    120** 60*** 8 : slabdata***** 0 23911479***** 0
>>>>    --
>>>>
>>>>    Hopefully this should help.* Not sure which application might be causing
>>>>    the leaks.* Currently R is the only app that users seem to be using
>>>>    heavily on these clients.* Will let you know what I find.
>>>>         
>>> Tommi Tervo has filed a bugzilla ticket for this issue, see
>>> https://bugzilla.lustre.org/show_bug.cgi?id=22701
>>>
>>> Could you please add a comment to this ticket to describe the
>>> behavior of the application "R" (fork many threads, write to
>>> many files, use direct i/o, ...)?
>>>
>>> Cheers,
>>> Johann
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>>
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>   
>>>
>>>       
>>     
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100830/e6b83e22/attachment.htm>


More information about the lustre-discuss mailing list