[Lustre-discuss] finding performance issues

Fri Dec 10 11:42:11 PST 2010

We have an lustre 1.6.x filesystem,

4 OSS,  3 x4500 and 1 ddn s2a6620

Each oss has 4 1gig interfaces bonded, or 1 10gig interface.

I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running.   IO per OST is around 15-20MB/s  (31 total ost's) 

This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs).

*  Is there a recommendation of a better way to do these sorts of jobs?  The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO.

* Is there a better way to tune?  What should I be looking for to tune?

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985