[Lustre-discuss] Interpreting stats files
Dragseth Roy Einar
roy.dragseth at uit.no
Tue Nov 25 13:49:33 PST 2014
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Dilger, Andreas
> Sent: 8. november 2014 00:26
> To: Dragseth Roy Einar; lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Interpreting stats files
> On 2014/11/07, 4:06 PM, "Dragseth Roy Einar" <roy.dragseth at uit.no> wrote:
> >Many thanks for the quick replies. lltop seems to be a good start for
> >a tool to single out the heaviest IO users. Just need to create a
> >wrapper that maps the node names to torque jobids.
> >Have a nice weekend!
> If you have Lustre 2.4 or later, you can enable the "Jobstats" aka "JobID"
> functionality in Lustre and it will handle the mapping of RPC statistics to
> Torque jobids already.
> This is described in the Lustre User Manual.
The job stats seems to be a really nice feature, but we are still on Lustre 2.1. I will look into it when we upgrade.
I created a little python script that restructures the lltop output and identifies which jobs (and users) running on the nodes that hit the lustre file system hardest. It has been accepted into the contrib section of lltop, it only works for torque though.
Again, thanks for the help and comments,
More information about the lustre-discuss