[Lustre-discuss] collectl

Mark Seger Mark.Seger at hp.com
Wed Jul 30 04:35:49 PDT 2008


Andreas Dilger wrote:
> On Jul 29, 2008  18:43 -0400, Mark Seger wrote:
>   
>> your definition are perfect and I'll add them to my documentation.  With  
>> respect to my comment about chmod on 100 files only incrementing the  
>> counter once, it did it 100 times this time when I tired it, so never  
>> mind...
>>
>> one counter you didn't mention is getstatus.  when does that get updated?
>>     
>
> That one is only used once at mount... 
>   
ahhh, I was playing with mounting/unmounting, watching 
connect/disconnect change and did see that one change with mounts.

Back on the subject of collectl, the general philosophy is to show rates 
as counters/sec and so when they hit an unexpected high or low, they 
then to jump out at you.
Given that mount/unmounts typically happen in large chunks and not all 
that often, I'm not sure showing the rates of them are all that useful.  
I suppose one might also make that argument about things like statfs, 
getattr - the only time I was able to make them change was in response 
to lfs commands. Might that logic also be applied to extended attributes 
and acl counters which I suspect also fall into the category of slowly 
changing counters?  In fact if you're sampling data once every 10 
seconds (which is the daemon default) and a counter is incremented by 4 
or less during that sample, it will show up at a rate of 0/sec!

On the other hand, it seems like the 'reint' counters are the ones that 
tend to change a lot. Perhaps a clue is they're all prefaced with reint 
which leads me to ask if there is some simple definition of what reint 
actually means other than 'reintegrated operations'?  Perhaps such a 
definition will help explain why setattr is a reint counter but getattr 
is not.  In fact, I have seen getattr_lock change a lot more than 
getattr.  What is the difference between the 2 (obviously the latter is 
some sort of lock but it must be used more than just when incrementing 
getattr since they don't change together)?

That all said, it feels like the data to report is all the reints, 
getattr, getattr_lock and sync.  As a side note, collectl will collect 
all the mds data saving it in its 'raw' file, still making it possible 
to still get at even if not reported in a standard display format.

-mark

>> -mark
>>
>> Andreas Dilger wrote:
>>     
>>> On Jul 29, 2008  14:36 -0400, Mark Seger wrote:
>>>   
>>>       
>>>> One thing that confuses me about lustre counters, and maybe others, 
>>>> is I  don't really know what they mean, when they change and in fact 
>>>> how to  stimulate them to change.  For example, on my system I'm 
>>>> doing a watch  of /proc/fs/lustre/mdt/MDS/mds/stats and only see 1 
>>>> reint counter,  because the others are all 0.  So I went and did some 
>>>> file renames, and  chmods and sure enough, the other counters did 
>>>> appear.  Cool!
>>>>     
>>>>         
>>> Yes, this is expected.  We dropped the "0" counters because they are very
>>> noisy and useless in most contexts.
>>>
>>>   
>>>       
>>>> The easiest thing for me to do is to simply say that reint_setattr   
>>>> counts the number of setattrs, but that would be a pretty weak   
>>>> definition. When I changed did a single chmod to 100 files, setattr 
>>>> only  incremented by 1 and I expected it to increment by 100.
>>>>     
>>>>         
>>> It should have been incremented by 100, and if it didn't it is possibly
>>> a bug.
>>>
>>>   
>>>       
>>>> want to be the one responsible for the words or all you're going to 
>>>> see  is 'reint_setattr counts the number of setattr calls' and I 
>>>> really don't  think that would be all that useful to anyone.
>>>>     
>>>>         
>>> "reint_setattr" includes all operations that modify inode attributes,
>>> including chmod, chown, touch, etc.
>>>
>>>   
>>>       
>>>>>>>> mds_reint_create          11018837 samples [reqs] 1 1 
>>>>>>>> 11018837              
>>>>>>>>                 
>>> For mknod and mkdir operations, also used by NFS servers internally
>>> when creating files.
>>>
>>>   
>>>       
>>>>>>>> mds_reint_link            51315 samples [reqs] 1 1 51315 51315
>>>>>>>>             
>>>>>>>>                 
>>> For hard or symbolic links, like with "ln"
>>>
>>>   
>>>       
>>>>>>>> mds_reint_rename         224241 samples [reqs] 1 1 224241 224241
>>>>>>>>             
>>>>>>>>                 
>>> For file and directory renames, like with "mv".
>>>
>>>   
>>>       
>>>>>>>> mds_reint_unlink          13109877 samples [reqs] 1 1 
>>>>>>>> 13109877              
>>>>>>>>                 
>>> For removing files and directories, like with "rm" or "rmdir".
>>>
>>>   
>>>       
>>>>>>>> mds_getxattr              36089 samples [usec] 9 8996 675208 252525110
>>>>>>>>             
>>>>>>>>                 
>>> For extended attributes and ACLs, like with "getfattr" or "getfacl".
>>>
>>>   
>>>       
>>>>>>>> mds_setxattr              1230 samples [usec] 123 10110 
>>>>>>>> 263367              
>>>>>>>>                 
>>> For extended attributes and ACLs, like with "setfattr" or "setfacl".
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Sr. Staff Engineer, Lustre Group
>>> Sun Microsystems of Canada, Inc.
>>>   
>>>       
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>   




More information about the lustre-discuss mailing list