[Lustre-discuss] collectl
Mark Seger
Mark.Seger at hp.com
Wed Jul 30 04:35:49 PDT 2008
Andreas Dilger wrote:
> On Jul 29, 2008 18:43 -0400, Mark Seger wrote:
>
>> your definition are perfect and I'll add them to my documentation. With
>> respect to my comment about chmod on 100 files only incrementing the
>> counter once, it did it 100 times this time when I tired it, so never
>> mind...
>>
>> one counter you didn't mention is getstatus. when does that get updated?
>>
>
> That one is only used once at mount...
>
ahhh, I was playing with mounting/unmounting, watching
connect/disconnect change and did see that one change with mounts.
Back on the subject of collectl, the general philosophy is to show rates
as counters/sec and so when they hit an unexpected high or low, they
then to jump out at you.
Given that mount/unmounts typically happen in large chunks and not all
that often, I'm not sure showing the rates of them are all that useful.
I suppose one might also make that argument about things like statfs,
getattr - the only time I was able to make them change was in response
to lfs commands. Might that logic also be applied to extended attributes
and acl counters which I suspect also fall into the category of slowly
changing counters? In fact if you're sampling data once every 10
seconds (which is the daemon default) and a counter is incremented by 4
or less during that sample, it will show up at a rate of 0/sec!
On the other hand, it seems like the 'reint' counters are the ones that
tend to change a lot. Perhaps a clue is they're all prefaced with reint
which leads me to ask if there is some simple definition of what reint
actually means other than 'reintegrated operations'? Perhaps such a
definition will help explain why setattr is a reint counter but getattr
is not. In fact, I have seen getattr_lock change a lot more than
getattr. What is the difference between the 2 (obviously the latter is
some sort of lock but it must be used more than just when incrementing
getattr since they don't change together)?
That all said, it feels like the data to report is all the reints,
getattr, getattr_lock and sync. As a side note, collectl will collect
all the mds data saving it in its 'raw' file, still making it possible
to still get at even if not reported in a standard display format.
-mark
>> -mark
>>
>> Andreas Dilger wrote:
>>
>>> On Jul 29, 2008 14:36 -0400, Mark Seger wrote:
>>>
>>>
>>>> One thing that confuses me about lustre counters, and maybe others,
>>>> is I don't really know what they mean, when they change and in fact
>>>> how to stimulate them to change. For example, on my system I'm
>>>> doing a watch of /proc/fs/lustre/mdt/MDS/mds/stats and only see 1
>>>> reint counter, because the others are all 0. So I went and did some
>>>> file renames, and chmods and sure enough, the other counters did
>>>> appear. Cool!
>>>>
>>>>
>>> Yes, this is expected. We dropped the "0" counters because they are very
>>> noisy and useless in most contexts.
>>>
>>>
>>>
>>>> The easiest thing for me to do is to simply say that reint_setattr
>>>> counts the number of setattrs, but that would be a pretty weak
>>>> definition. When I changed did a single chmod to 100 files, setattr
>>>> only incremented by 1 and I expected it to increment by 100.
>>>>
>>>>
>>> It should have been incremented by 100, and if it didn't it is possibly
>>> a bug.
>>>
>>>
>>>
>>>> want to be the one responsible for the words or all you're going to
>>>> see is 'reint_setattr counts the number of setattr calls' and I
>>>> really don't think that would be all that useful to anyone.
>>>>
>>>>
>>> "reint_setattr" includes all operations that modify inode attributes,
>>> including chmod, chown, touch, etc.
>>>
>>>
>>>
>>>>>>>> mds_reint_create 11018837 samples [reqs] 1 1
>>>>>>>> 11018837
>>>>>>>>
>>> For mknod and mkdir operations, also used by NFS servers internally
>>> when creating files.
>>>
>>>
>>>
>>>>>>>> mds_reint_link 51315 samples [reqs] 1 1 51315 51315
>>>>>>>>
>>>>>>>>
>>> For hard or symbolic links, like with "ln"
>>>
>>>
>>>
>>>>>>>> mds_reint_rename 224241 samples [reqs] 1 1 224241 224241
>>>>>>>>
>>>>>>>>
>>> For file and directory renames, like with "mv".
>>>
>>>
>>>
>>>>>>>> mds_reint_unlink 13109877 samples [reqs] 1 1
>>>>>>>> 13109877
>>>>>>>>
>>> For removing files and directories, like with "rm" or "rmdir".
>>>
>>>
>>>
>>>>>>>> mds_getxattr 36089 samples [usec] 9 8996 675208 252525110
>>>>>>>>
>>>>>>>>
>>> For extended attributes and ACLs, like with "getfattr" or "getfacl".
>>>
>>>
>>>
>>>>>>>> mds_setxattr 1230 samples [usec] 123 10110
>>>>>>>> 263367
>>>>>>>>
>>> For extended attributes and ACLs, like with "setfattr" or "setfacl".
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Sr. Staff Engineer, Lustre Group
>>> Sun Microsystems of Canada, Inc.
>>>
>>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
More information about the lustre-discuss
mailing list