[Lustre-discuss] finding performance issues

Fri Dec 10 12:14:15 PST 2010

On 12/10/2010 11:42 AM, Brock Palen wrote:
> We have an lustre 1.6.x filesystem,

1.6 has been dead for well over a year. End Of Life.
>
> 4 OSS,  3 x4500 and 1 ddn s2a6620
>
> Each oss has 4 1gig interfaces bonded, or 1 10gig interface.
>
> I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running.   IO per OST is around 15-20MB/s  (31 total ost's)
>
> This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs).
>
> *  Is there a recommendation of a better way to do these sorts of jobs?

Upgrade to the latest release of Lustre.

  The compute nodes have 48GB of ram, he does not use much ram for the 
job just all the IO.
>
> * Is there a better way to tune?

Yes, you upgrade to the code that has all the tuning fixes/enhancements 
- Lustre 1.8

  What should I be looking for to tune?
You are wasting your time tuning here.
1.8 supports many things, including cache on OSTs which would likely 
help bunches in your case.

cliffw

>
> Thanks!
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss