[Lustre-discuss] finding performance issues

Fri Dec 10 23:51:05 PST 2010

On 2010-12-10, at 12:42, Brock Palen wrote:
> We have an lustre 1.6.x filesystem,
> 
> 4 OSS,  3 x4500 and 1 ddn s2a6620
> 
> Each oss has 4 1gig interfaces bonded, or 1 10gig interface.
> 
> I have a user who is running a few hundred serial jobs that are all accessing the same 16GB file, we striped the file over all the osts, and are tapped at 500-600MB/s no mater the number of hosts running.   IO per OST is around 15-20MB/s  (31 total ost's) 

How big is the IO size?  Are all the clients both reading and writing this same file?  Presumably you see better performance when so many jobs are not running against the filesystem?

> This set of jobs keeps reading in the same data set, and has been running for about 24 hours (the group of about 900 total jobs).
> 
> *  Is there a recommendation of a better way to do these sorts of jobs?  The compute nodes have 48GB of ram, he does not use much ram for the job just all the IO.

I agree with Cliff that the 1.8 OSS read cache will probably help the performance in this case.  OSS read cache does not need a client-side upgrade to work, though of course I'd suggest upgrading the clients anyway.

1.8.5 was just released this week...

> * Is there a better way to tune?  What should I be looking for to tune?

Start by looking at /proc/fs/lustre/obdfilter/*/brw_stats on the OSTs.  It should be reset before the job (echo 0 to each file) so you get stats relevant to that job only.  You can also check iostat on the OSS nodes to see how busy the disks are.  They may be imbalanced due to being different hardware, and will only go as fast as the slowest OSTs.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.