[Lustre-discuss] How to determine which lustre clients are loading filesystem.

Seger, Mark mark.seger at hp.com
Sun Jul 11 14:56:38 PDT 2010


Wojciech Turek <wjt27 at ...<mailto:wjt27 at ...>> writes:



>

>

> Thank you all for very useful suggestions. The Andreas's way which

> uses

rpc_history gave out exactly what I was looking for in a quite easy to read form.

> On 9 July 2010 18:26, Andreas Dilger <andreas.dilger-

QHcLZuEGTsvQT0dZR+AlfA at public.gmane.org<mailto:QHcLZuEGTsvQT0dZR+AlfA at public.gmane.org>> wrote:

> On 2010-07-08, at 16:11, Bernd Schubert wrote:

> >> Bernd, would you (or anyone) be interested to enhance those tools

> >> to be

able to show stats data from multiple files at once (each prefixed by the device name and/or client NID)? Â I don't think it makes sense to create separate tools for this.

>



For what it's worth, you can get very detailed client-side stats from collectl.

The way it figures out what the client is doing is to actually look at the ost- level stats and add them up!  Why?  because that means you can they replay the data and break things down by OST.



There are also client side switches to look at BRW stats, readahead stats and even what's going on with meta-data.  If you then plot the data with colplot you can drill down and look at all kinds of things.  For example if  you have data from multiple clients you can even compare it side-by-side.  check out collectl-utils on sourceforge if you haven't yet.



Alas, I'm one of the few people (I think) who ever gets into this level of analysis because I fear the number of switches tend to scare people off.  ;)



-mark



> >

> > I'm not sure if the existing lustre tools are really what we need.

> > If you

have a cluster with 200 or more clients and then want to figure out which clients are doing most IO, several lines per client provide too much output.

> I agree, but having a 200-column line is also not very useful. Â I

> like the

"llobdstat" output where it prints the IO numbers, and then appends only the abbreviated values that are changing for that interval, instead of printing all of the values.

>

> > One line sorted by IO seems to be better, IMHO.

> The commands that I posted using the rpc_history file will print out a

> summary

of all client RPC counts sorted by maximum user. Â Something similar could be done by aggregating all of the per-client stats as well, though it would mean touching a lot more input files for each interval.

>

> > I would be for interested to enhance the existing tools, but then if

> > I look

into the number of open bugs I have, several of those have a higher priorty (btw, this script is among my bug list (bug 22469)).

> I was actually hoping that someone else might take it up. Â The llstat

> and

llobdstat scripts are perl, and there should be a good number of people who could do a bit of perl hacking.

> The scripts are currently "vmstat" or "iostat" like, in that they

> print out

the parameters as they change over time. Â It might also be interesting (if someone has the perl-fu to do it) to have a "top" mode, where it resets the screen position each time and sorts the output from all of the clients.

>

>

>

> Cheers, Andreas

> --

> Andreas Dilger

> Lustre Technical Lead

> Oracle Corporation Canada Inc.

>

>

>

>

> -- --Wojciech Turek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100711/41e56fdc/attachment.htm>


More information about the lustre-discuss mailing list