[Lustre-discuss] collectl
Andreas Dilger
adilger at sun.com
Wed Jul 30 14:50:22 PDT 2008
On Jul 30, 2008 14:52 -0400, Mark Seger wrote:
> I just did an interesting experiment. If I run the command 'lfs df' in
> a tight loop, the reint_statfs counters increments once but if I stick
> in a sleep 1, it increments it for every call. is this a bug or a
> feature? Is it doing some kind of short-lived cache lookup in the first
> can but not the second?
Yes, the client will cache the "df" data for up to a second, to allow
something like "grep [0-9] /proc/fs/lustre/{mdc,osc}/*/{kbytes,files}*"
to avoid doing STATFS RPCs for each file.
There was a bug fixed recently in the caching of statfs data, that if
the statfs was requested too frequently it won't refresh properly
from the servers.
> Kilian CAVALOTTI wrote:
>>> useful. I suppose one might also make that argument about things like
>>> statfs, getattr - the only time I was able to make them change was in
>>> response to lfs commands. Might that logic also be applied to
>>> extended attributes and acl counters which I suspect also fall into
>>> the category of slowly changing counters?
>>
>> If you have ACLs enabled on your MDS, then every "ls -l" will induce
>> getxattr()s and the mds_getxattr counter will be increased by as much.
>> So this can change quickly. mds_setxattr, on the other hand, may change
>> less often, since you usually set ACLs less often than you list files.
>> But it can still be interesting to see if mds_setxattr goes through the
>> roof.
>>
>>
>>> On the other hand, it seems like the 'reint' counters are the ones
>>> that tend to change a lot. Perhaps a clue is they're all prefaced
>>> with reint which leads me to ask if there is some simple definition
>>> of what reint actually means other than 'reintegrated operations'?
>>>
>>
>> I'd bet on "request identification" or something along those lines.
>>
>>
>>> Perhaps such a definition will help explain why setattr is a reint
>>> counter but getattr is not. In fact, I have seen getattr_lock change
>>> a lot more than getattr. What is the difference between the 2
>>> (obviously the latter is some sort of lock but it must be used more
>>> than just when incrementing getattr since they don't change
>>> together)?
>>>
>>
>> I'm only speculating here, but I believe that extended attributes which
>> are modifiable by a user on a client (like ACLs) are counted in
>> *_xattr, while internal extended attributes used by the MDS, are
>> counted in gettatr.
>>
>>
>>> That all said, it feels like the data to report is all the reints,
>>> getattr, getattr_lock and sync.
>>
>> I would also be interested in seeing (dis)connect (this can probably
>> reveal network problems, if it increases too much), as well as quotactl
>> and get/setxattr, since I use quotas and ACLs. :)
>>
>>
>> Cheers,
>>
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list