[Lustre-discuss] lustre ram usage (contd)

Mon Dec 24 08:38:20 PST 2007

In my opinion there are a couple of problems with cron jobs that do 
monitoring.  On the positive note they're quick and easy, but on the 
downside you have extra work to do it you want timestamps and then, 
there's the issue about all the other potential system metrics you're 
missing out on.  The neat thing about collectl is it essentially does it 
all!  In the case of lustre that means if you run it with the defaults 
you'll get cpu, memory, network, and more in addition to the slab data.  
However, if you really want to get crazy, you can get the performance by 
ost and even the rpc stats.  The one negative with collectl is while it 
can do a lot, that translates into a lot of options which can be 
confusing at first.
-mark

Balagopal Pillai wrote:
> Thanks Mark. This looks handy. I was about to put a cron job with vmstat 
> to see how the memory utilization progresses with the early morning rsync .
> Since i put another 4G on both OSS today morning, hopefully it should be 
> enough for its operation.
>
> Regards
> Balagopal
>
>
> Mark Seger wrote:
>   
>> If you're really interesting in tracking memory utilization, collectl 
>> - see http://collectl.sourceforge.net/ - when run as a daemon will 
>> collect/log all slab data once a minute and you can change the 
>> frequency to anything you like.  You can then later play it back and 
>> see exactly what is happening over time.  As another approach you can 
>> run interactively and if you specify the -oS switch, you'll only see 
>> changes as they occur.  Including the 'T' will time stamp them as in 
>> the example below:
>>
>> [root at cag-dl380-01 root]# collectl -sY -oST -i:1
>> # SLAB DETAIL
>> #                               
>> <-----------Objects----------><---------Slab Allocation------>
>> #         Name                  InUse   Bytes   Alloc   Bytes   
>> InUse   Bytes   Total   Bytes
>> 11:02:02 size-512                 146   74752     208  106496      
>> 21   86016      26  106496
>> 11:02:07 sigqueue                 319   42108     319   42108      
>> 11   45056      11   45056
>> 11:02:07 size-512                 208  106496     208  106496      26  
>> 106496      26  106496
>>
>> Since this isn't a lustre system there isn't a whole lot of activity...
>>
>> -mark
>>
>> Andreas Dilger wrote:
>>     
>>> On Dec 23, 2007  18:01 -0400, Balagopal Pillai wrote:
>>>  
>>>       
>>>>            The cluster is made idle on the weekend to look at the 
>>>> Lustre ram consumpton issue. The ram used during yesterday's rsync 
>>>> is still not freed up. Here is the output from free
>>>>              total       used       free     shared    buffers     
>>>> cached
>>>> Mem:       4041880    3958744      83136          0     876132     
>>>> 144276
>>>> -/+ buffers/cache:    2938336    1103544
>>>> Swap:      4096564        240    4096324
>>>>     
>>>>         
>>> Note that this is normal behaviour for Linux.  Ram that is unused 
>>> provides
>>> no value, so all available RAM is used for cache until something else is
>>> needing to use this memory.
>>>
>>>  
>>>       
>>>>           Looking at vmstat -m, there is something odd. Seems like 
>>>> ext3_inode_cache and dentry_cache seems to be the biggest occupants 
>>>> of ram. ldiskfs_inode_cache comparatively smaller.   -
>>>>
>>>> Cache                       Num  Total   Size  Pages
>>>> ldiskfs_inode_cache      430199 440044    920      4
>>>> ldlm_locks                10509  12005    512      7
>>>> ldlm_resources            10291  11325    256     15
>>>> buffer_head              230970 393300     88     45
>>>>     
>>>>         
>>>  
>>>       
>>>> ext3_inode_cache         1636505 1636556    856      4
>>>> dentry_cache             1349923 1361216    240     16
>>>>     
>>>>         
>>> This is odd, because Lustre doesn't use ext3 at all.  It uses ldiskfs
>>> (which is ext3 renamed + patches), so it is some non-Lustre filesystem
>>> usage which is consuming most of your memory.
>>>
>>>  
>>>       
>>>>              Is there anything in proc as explained in 
>>>> http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-proc-directories.html 
>>>>
>>>> that can force the kernel to flush out the dentry_cache and 
>>>> ext3_inode_cache when the rsync is over and cache is not needed 
>>>> anymore? Thanks very much.
>>>>     
>>>>         
>>> Only to unmount and remount the filesystem, on the server.  On Lustre
>>> clients there is a mechanism to flush Lustre cache, but that doesn't
>>> help you here.
>>>
>>> Cheers, Andreas
>>> -- 
>>> Andreas Dilger
>>> Sr. Staff Engineer, Lustre Group
>>> Sun Microsystems of Canada, Inc.
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at clusterfs.com
>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>>   
>>>       
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>