[Lustre-discuss] performance tuning

Martin Pokorny mpokorny at nrao.edu
Thu Jul 2 15:43:22 PDT 2009


Hi Andreas,

Thanks for your informative reply. In general terms, what you've written 
confirms my suspicions as to the underlying factors limiting the 
filesystem's performance in my application. I've interspersed a few 
comments below.

Andreas Dilger wrote:
> Note that using single SCSI disks means you have no redundancy of your
> data.  If any disk is lost, and you are striping your files over all
> of the OSTs (as it seems from below) then all of your files will also
> lose data.  That might be fine if Lustre is just used as a scratch
> filesystem, but it might also not be what you are expecting.

The Lustre filesystem in this application is, in fact, a scratch 
filesystem. Once the files have been written, they are copied to an 
archive area. Although I might be interested in availability/reliability 
for this filesystem to some degree in the future, presently it's 
performance that I'm after.

> Writing small file chunks from many clients to a single file is definitely
> one way to have very bad IO performance with Lustre.
> 
> Some ways to improve this:
> - have the application aggregate writes some amount before submitting
>   them to Lustre.  Lustre by default enforces POSIX coherency semantics,
>   so it will result in lock ping-pong between client nodes if they are
>   all writing to the same file at one time

That's a possibility, but limited to a degree by the instrument 
streaming the raw data into the cluster, and the output file format. I'm 
already in discussion with others on the project about this approach.

> - have the application to 4kB O_DIRECT sized IOs to the file and disable
>   locking on the output file.  That will avoid partial-page IO submissions,
>   and by disabling locking you will at least avoid the contention between
>   the clients.

I'll try this out. Luckily, no application level locking is being done 
at this time.

> - I thought there was also an option to have clients do lockless/uncached
>   IO wihtout changing the app, but I can't recall the details on how to
>   activate it.  Possibly another of the Lustre engineers will recall.

I'd be interested in finding out how to do that.

> - add more disks, or use SSD disks for the OSTs.  This will improve your
>   IOPS rate dramatically.  It probably makes sense to create larger OSTs
>   rather than many smaller OSTs due to less overhead (journal, connections,
>   etc).

I have been wondering about the effect SSD disks might have. 
Unfortunately, for now, I need to show that it's worth my time to keep 
working on a Lustre solution.

> - using MPI-IO might also help

MPI-IO is already on my list of things to try.

Thanks again.

-- 
Martin



More information about the lustre-discuss mailing list