[Lustre-discuss] Bad read performance

Fri Aug 21 02:09:29 PDT 2009

thanks for the hint, but unfortunately I can't make any updates to the
cluster...

Do you think both of the problems I experienced are bugs in Lustre and are
resolved in current versions?

Thanks.
Alvaro.

On Fri, Aug 21, 2009 at 6:32 AM, di wang <di.wang at sun.com> wrote:

> Hello,
>
> You may see bug 17197 and try to apply this patch
> https://bugzilla.lustre.org/attachment.cgi?id=25062  to your lustre src.
> Or you can wait 1.8.2.
>
> Thanks
> Wangdi
>
> Alvaro Aguilera wrote:
>
>> Hello,
>>
>> as a project for college I'm doing a behavioral comparison between Lustre
>> and CXFS when dealing with simple strided files using POSIX semantics. On
>> one of the tests, each participating process reads 16 chunks of data with a
>> size of 32MB each, from a common, strided file using the following code:
>>
>>
>> ------------------------------------------------------------------------------------------
>> int myfile = open("thefile", O_RDONLY);
>>
>> MPI_Barrier(MPI_COMM_WORLD); // the barriers are only to help measuring
>> time
>>
>> off_t distance = (numtasks-1)*p.buffersize;
>> off_t offset = rank*p.buffersize;
>>
>> int j;
>> lseek(myfile, offset, SEEK_SET);
>> for (j = 0; j < p.buffercount; j++) {
>>       read(myfile, buffers[j], p.buffersize); // buffers are aligned to
>> the page size
>>       lseek(myfile, distance, SEEK_CUR);
>> }
>>
>> MPI_Barrier(MPI_COMM_WORLD);
>>
>> close(myfile);
>>
>> ------------------------------------------------------------------------------------------
>>
>> I'm facing the following problem: when this code is run in parallel the
>> read operations on certain processes start to need more and more time to
>> complete. I attached a graphical trace of this, when using only 2 processes.
>> As you see, the read operations on process 0 stay more or less constant,
>> taking about 0.12 seconds to complete, while on process 1 they increase up
>> to 39 seconds!
>>
>> If I run the program with only one process, then the time stays at ~0.12
>> seconds per read operation. The problem doesn't appear if the O_DIRECT flag
>> is used.
>>
>> Can somebody explain to me why is this happening? Since I'm very new to
>> Lustre, I may be making some silly mistakes, so be nice to me ;)
>>
>> I'm using Lustre SLES 10 Patchlevel 1, Kernel
>> 2.6.16.54-0.2.5_lustre.1.6.5.1.
>>
>>
>> Thanks!
>>
>> Alvaro Aguilera.
>>
>>
>> ------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090821/ae29e256/attachment.htm>