[Lustre-discuss] Lustre SNMP module
Mark Seger
Mark.Seger at hp.com
Thu Mar 20 15:32:47 PDT 2008
>> Be careful here. You can certain stick some data into an rrd but
>> certainly not all of it, especially if you want to collect a lot of
>> it at a reasonable frequency. If you want accurate detail plots,
>> you've gotta go to the data stored on each separate system. I just
>> don't see any way around this, at least not yet...
>>
>
> Yes, you're absolutely right. Given its intrinsic multi-scale nature, a
> RRD is well suited for keeping historical data on large time scales.
> This could allow a very convenient graphical overview of the different
> system metrics, but would be pointless for debugging purposes, where
> you do need fine-grained data. That's where collectl is the most useful
> for me.
>
> But what about both? I don't see any reason why collectl couldn't
> provide high-frequency accurate data to diagnose problems locally, and
> at the same time allow to aggregate less precise values in RRD for
> global visualization of multi-hosts systems.
>
I agree 1000%... The mental model I've been building in my head is to
tell collectl to log its data locally and also write out an s-expression
using --sexpr. Then a daemon can periodically pick out the data its
interested at whatever frequency it's interested in and forward it on up
the line.
>> As a final note, I've put together a tutorial on using collectl in a
>> lustre environment and have upload a preliminary copy at
>> http://collectl.sourceforge.net/Tutorial-Lustre.html in case anyone
>> wants to preview it before I link it into the documentation.
>> If nothing else, look at my very last example where I show what you
>> can see by monitoring lustre at the same time as your network
>> interface.
>>
>
> Very good, thanks for this. The readahead experiment is insightful.
>
It was to me when I first encountered the problem.
>> Did I also mention that collectl is probably one of the few tools
>> that can monitor your Infiniband traffic as well?
>>
>
> That's why it rocks. :)
>
> Now the only thing which still make me want to use other monitoring
> software is the ability to get a global view. Centralized data
> collection and easy graphing (RRD feeding) are still what I need most
> of the time.
>
I hear you here too. That's the main reason I put in the ability to
generate data in plottable format. That's as close as I'm willing to go
with providing a graphing capability in collectl itself. I'm trying
real hard to bound its scope as I figure it already has more than enough
switches... 9-)
-mark
More information about the lustre-discuss
mailing list