[Lustre-discuss] How to detect process owner on client

Sebastien Piechurski spiechurski at sgi.com
Tue Mar 1 03:40:53 PST 2011


Hi Satoshi,

I don't have a complete solution to your problem, but I have written a script which lets me find at least the lustre client responsible for the bad I/Os. We are using PBSPro with nodes set as job-exclusive, so determining the job and user is then a lot more easier.

The script does that:
Dump the attributes in /proc/fs/lustre/obdfilter/*/exports/*/stats
Sleep a few seconds (tunable) 
Dump again all the attributes, and then use diff to see which clients changed their IO count.
These changes are then sorted numerically.
The final result is a list of IP adresses with the number of IO done during the sleep period for each.
The last one in the list (because it is sorted) points to the responsible client(s).

Hope the method helps.


> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org 
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of 
> Satoshi Isono
> Sent: vendredi 11 février 2011 04:16
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] How to detect process owner on client
> 
> Dear members,
> 
> I am looking into the way which can detect userid or jobid on 
> the Lustre client. Assumed the following condition;
> 
>  1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE.
>  2) A users processes occupy Lustre I/O.
>  3) Some Lustre servers (MDS?/OSS?) can detect high I/O 
> stress on each server.
>  4) But Lustre server cannot make the mapping between 
> jobid/userid and Lustre I/O processes having heavy stress, 
> because there aren't userid on Lustre servers.
>  5) I expect that Lustre can monitor and can make the mapping.
>  6) If possible for (5), we can make a script which launches 
> scheduler command like as qdel.
>  7) Heavy users job will be killed by job scheduler.
> 
> I want (5) for Lustre capability, but I guess current Lustre 
> 1.8 cannot perform (5). On the other hand, in order to map 
> Lustre process to userid/jobid, are there any ways using like 
> rpctrace or nid stats? Can you please your advice or comments?
> 
> Regards,
> Satoshi Isono
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 



More information about the lustre-discuss mailing list