[Lustre-discuss] Bad read performance

Fri Aug 21 06:15:36 PDT 2009

Alvaro Aguilera wrote:
> thanks for the hint, but unfortunately I can't make any updates to the 
> cluster...
>
> Do you think both of the problems I experienced are bugs in Lustre and 
> are resolved in current versions?
It should be lustre bugs. The 2 processes runs on different node or same 
node?

Thanks
WangDi
>
> Thanks.
> Alvaro.
>
> On Fri, Aug 21, 2009 at 6:32 AM, di wang <di.wang at sun.com 
> <mailto:di.wang at sun.com>> wrote:
>
>     Hello,
>
>     You may see bug 17197 and try to apply this patch
>     https://bugzilla.lustre.org/attachment.cgi?id=25062  to your
>     lustre src. Or you can wait 1.8.2.
>
>     Thanks
>     Wangdi
>
>     Alvaro Aguilera wrote:
>
>         Hello,
>
>         as a project for college I'm doing a behavioral comparison
>         between Lustre and CXFS when dealing with simple strided files
>         using POSIX semantics. On one of the tests, each participating
>         process reads 16 chunks of data with a size of 32MB each, from
>         a common, strided file using the following code:
>
>         ------------------------------------------------------------------------------------------
>         int myfile = open("thefile", O_RDONLY);
>
>         MPI_Barrier(MPI_COMM_WORLD); // the barriers are only to help
>         measuring time
>
>         off_t distance = (numtasks-1)*p.buffersize;
>         off_t offset = rank*p.buffersize;
>
>         int j;
>         lseek(myfile, offset, SEEK_SET);
>         for (j = 0; j < p.buffercount; j++) {
>               read(myfile, buffers[j], p.buffersize); // buffers are
>         aligned to the page size
>               lseek(myfile, distance, SEEK_CUR);
>         }
>
>         MPI_Barrier(MPI_COMM_WORLD);
>
>         close(myfile);
>         ------------------------------------------------------------------------------------------
>
>         I'm facing the following problem: when this code is run in
>         parallel the read operations on certain processes start to
>         need more and more time to complete. I attached a graphical
>         trace of this, when using only 2 processes.
>         As you see, the read operations on process 0 stay more or less
>         constant, taking about 0.12 seconds to complete, while on
>         process 1 they increase up to 39 seconds!
>
>         If I run the program with only one process, then the time
>         stays at ~0.12 seconds per read operation. The problem doesn't
>         appear if the O_DIRECT flag is used.
>
>         Can somebody explain to me why is this happening? Since I'm
>         very new to Lustre, I may be making some silly mistakes, so be
>         nice to me ;)
>
>         I'm using Lustre SLES 10 Patchlevel 1, Kernel
>         2.6.16.54-0.2.5_lustre.1.6.5.1.
>
>
>         Thanks!
>
>         Alvaro Aguilera.
>
>
>         ------------------------------------------------------------------------
>
>         ------------------------------------------------------------------------
>
>
>
>         _______________________________________________
>         Lustre-discuss mailing list
>         Lustre-discuss at lists.lustre.org
>         <mailto:Lustre-discuss at lists.lustre.org>
>         http://lists.lustre.org/mailman/listinfo/lustre-discuss
>          
>
>
>