[Lustre-discuss] collectl

Andreas Dilger adilger at sun.com
Wed Jul 30 14:50:22 PDT 2008


On Jul 30, 2008  14:52 -0400, Mark Seger wrote:
> I just did an interesting experiment.  If I run the command 'lfs df' in  
> a tight loop, the reint_statfs counters increments once but if I stick  
> in a sleep 1, it increments it for every call.  is this a bug or a  
> feature?  Is it doing some kind of short-lived cache lookup in the first  
> can but not the second?

Yes, the client will cache the "df" data for up to a second, to allow
something like "grep [0-9] /proc/fs/lustre/{mdc,osc}/*/{kbytes,files}*"
to avoid doing STATFS RPCs for each file.

There was a bug fixed recently in the caching of statfs data, that if
the statfs was requested too frequently it won't refresh properly
from the servers.

> Kilian CAVALOTTI wrote:
>>> useful. I suppose one might also make that argument about things like
>>> statfs, getattr - the only time I was able to make them change was in
>>> response to lfs commands. Might that logic also be applied to
>>> extended attributes and acl counters which I suspect also fall into
>>> the category of slowly changing counters?      
>>
>> If you have ACLs enabled on your MDS, then every "ls -l" will induce  
>> getxattr()s and the mds_getxattr counter will be increased by as much.  
>> So this can change quickly. mds_setxattr, on the other hand, may change 
>> less often, since you usually set ACLs less often than you list files.  
>> But it can still be interesting to see if mds_setxattr goes through the 
>> roof.
>>
>>   
>>> On the other hand, it seems like the 'reint' counters are the ones
>>> that tend to change a lot. Perhaps a clue is they're all prefaced
>>> with reint which leads me to ask if there is some simple definition
>>> of what reint actually means other than 'reintegrated operations'?    
>>>  
>>
>> I'd bet on "request identification" or something along those lines.
>>
>>   
>>> Perhaps such a definition will help explain why setattr is a reint
>>> counter but getattr is not.  In fact, I have seen getattr_lock change
>>> a lot more than getattr.  What is the difference between the 2
>>> (obviously the latter is some sort of lock but it must be used more
>>> than just when incrementing getattr since they don't change
>>> together)?
>>>     
>>
>> I'm only speculating here, but I believe that extended attributes which 
>> are modifiable by a user on a client (like ACLs) are counted in  
>> *_xattr, while internal extended attributes used by the MDS, are  
>> counted in gettatr.
>>
>>   
>>> That all said, it feels like the data to report is all the reints,
>>> getattr, getattr_lock and sync.      
>>
>> I would also be interested in seeing (dis)connect (this can probably  
>> reveal network problems, if it increases too much), as well as quotactl 
>> and get/setxattr, since I use quotas and ACLs. :)
>>
>>
>> Cheers,
>>   

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list