[Lustre-discuss] Lustre SNMP module

Mark Seger Mark.Seger at hp.com
Thu Mar 20 15:32:47 PDT 2008


>> Be careful here.  You can certain stick some data into an rrd but
>> certainly not all of it, especially if you want to collect a lot of
>> it at a reasonable frequency.  If you want accurate detail plots,
>> you've gotta go to the data stored on each separate system.  I just
>> don't see any way around this, at least not yet...
>>     
>
> Yes, you're absolutely right. Given its intrinsic multi-scale nature, a 
> RRD is well suited for keeping historical data on large time scales. 
> This could allow a very convenient graphical overview of the different 
> system metrics, but would be pointless for debugging purposes, where 
> you do need fine-grained data. That's where collectl is the most useful 
> for me. 
>
> But what about both? I don't see any reason why collectl couldn't 
> provide high-frequency accurate data to diagnose problems locally, and 
> at the same time allow to aggregate less precise values in RRD for 
> global visualization of multi-hosts systems.
>   
I agree 1000%...  The mental model I've been building in my head is to 
tell collectl to log its data locally and also write out an s-expression 
using --sexpr.  Then a daemon can periodically pick out the data its 
interested at whatever frequency it's interested in and forward it on up 
the line.
>> As a final note, I've put together a tutorial on using collectl in a
>> lustre environment and have upload a preliminary copy at
>> http://collectl.sourceforge.net/Tutorial-Lustre.html in case anyone
>> wants to preview it before I link it into the documentation.  
>> If nothing else, look at my very last example where I show what you 
>> can see by monitoring lustre at the same time as your network
>> interface.  
>>     
>
> Very good, thanks for this. The readahead experiment is insightful.
>   
It was to me when I first encountered the problem.
>> Did I also mention that collectl is probably one of the few tools
>> that can monitor your Infiniband traffic as well?
>>     
>
> That's why it rocks. :)
>
> Now the only thing which still make me want to use other monitoring 
> software is the ability to get a global view. Centralized data 
> collection and easy graphing (RRD feeding) are still what I need most 
> of the time.
>   
I hear you here too.  That's the main reason I put in the ability to 
generate data in plottable format.  That's as close as I'm willing to go 
with providing a graphing capability in collectl itself.  I'm trying 
real hard to bound its scope as I figure it already has more than enough 
switches...  9-)
-mark





More information about the lustre-discuss mailing list