[Lustre-discuss] Read/Write performance problem

Andreas Dilger adilger at sun.com
Tue Oct 6 08:33:27 PDT 2009


On Oct 06, 2009  13:24 +0200, Michael Kluge wrote:
> We are running Lustre 1.6.5.1. The problem shows up when we read a
> shared file from multiple nodes that has just been written from the same
> set of nodes. 512 processes write a checkpoint (1.5 GB from each node)
> into a shared file by seeking to position RANK*1.5GB and writing 1.5GB
> in 1.44M chunks. Writing works fine and gives the full file system
> performance. The data is being written by using write() and no flags
> aside O_CREAT and O_WRONLY. If the checkpoint is written, the program is
> terminated and restarted and reads in the same portion of the file. For
> some reason this almost immediate reading of the same data that was just
> written on the same node is very slow. If we a) change the set of nodes
> or b) wait a day, we get the full read performance when we use the same
> executable and the same shared file. 
> 
> Is there a reason why an immediate read after a write on the same node
> from/to a shared file is slow? Is there any additional communication,
> e.g. is the client flushing the buffer cache before the first read? The
> statistics show that the average time to complete a 1.44MB read request
> is increasing during the runtime of our program. At some point it hits
> an upper limit or a saturation point and stays there. Is there some kind
> of queue or something that is getting full in this kind of
> write/read-scenario? May tuneable some stuff in /proc/fs/luste?

One possible issue is that you don't have enough extra RAM to cache 1.5GB
of the checkpoint, so during the write it is being flushed to the OSTs
and evicted from cache.  When you immediately restart there is still dirty
data being written from the clients that is contending with the reads to
restart.

As a general rule, avoiding unnecessary IO (i.e. reading back data that
was just written) reduces the time that the application is not doing
useful work (i.e. computing).


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list