[Lustre-discuss] Bad read performance

Thu Sep 3 17:23:58 PDT 2009

Hello,

Miss_inside_window vs hit is about 3 vs 2,  indeed too high. It probably 
means a lot of pages is read in by read-ahead, but later evicted before 
it is really being accessed. 
So the patch in bug17197 probably fix this problem, and which will be 
included in 1.8.2.

Thanks
WangDi

Alvaro Aguilera wrote:
> hi,
>
> here is the requested information:
>
> before test:
>
> llite.fastfs-ffff810102a6a400.read_ahead_stats=
> snapshot_time:         1251851453.382275 (secs.usecs)
> pending issued pages:           0
> hits                      7301235
> misses                    10546
> readpage not consecutive  14369
> miss inside window        1
> failed grab_cache_page    6285314
> failed lock match         0
> read but discarded        98955
> zero length file          0
> zero size window          3495
> read-ahead to EOF         172
> hit max r-a issue         783042
> wrong page from grab_cache_page 0
>
>
> after:
>
> llite.fastfs-ffff810102a6a400.read_ahead_stats=
> snapshot_time:         1251851620.183964 (secs.usecs)
> pending issued pages:           0
> hits                      7506005
> misses                    330064
> readpage not consecutive  14432
> miss inside window        319450
> failed grab_cache_page    6322954
> failed lock match         17294
> read but discarded        98955
> zero length file          0
> zero size window          3495
> read-ahead to EOF         192
> hit max r-a issue         837908
> wrong page from grab_cache_page 0
>
>
> there seems to by a lot of misses, as well as a locking problem, 
> doesn't it? Btw. in the test, 4 processes read 512mb each from a 2gb 
> big file.
>
> Regards,
> Alvaro.
>
> On Fri, Aug 21, 2009 at 3:38 PM, di wang <di.wang at sun.com 
> <mailto:di.wang at sun.com>> wrote:
>
>     hello,
>
>     Alvaro Aguilera wrote:
>
>         they run on different physical nodes and access the ost via 4x
>         infiniband.
>
>     I never heard such problems, if they on different nodes.  Client
>     memory?
>     Can you post  read-ahead  stats (before and after the test)  here by
>
>     lctl get_param llite.*.read_ahead_stats
>
>
>     But there are indeed a lot fixes about stride read since 1.6.5,
>     which is included in the tar ball I posted below.
>     And it probably can fix your problem.
>
>     Thanks
>     WangDi
>
>         On Fri, Aug 21, 2009 at 3:15 PM, di wang <di.wang at sun.com
>         <mailto:di.wang at sun.com> <mailto:di.wang at sun.com
>         <mailto:di.wang at sun.com>>> wrote:
>
>            Alvaro Aguilera wrote:
>
>                thanks for the hint, but unfortunately I can't make any
>                updates to the cluster...
>
>                Do you think both of the problems I experienced are bugs in
>                Lustre and are resolved in current versions?
>
>            It should be lustre bugs. The 2 processes runs on different
>         node
>            or same node?
>
>            Thanks
>            WangDi
>
>
>                Thanks.
>                Alvaro.
>
>
>                On Fri, Aug 21, 2009 at 6:32 AM, di wang
>         <di.wang at sun.com <mailto:di.wang at sun.com>
>                <mailto:di.wang at sun.com <mailto:di.wang at sun.com>>
>         <mailto:di.wang at sun.com <mailto:di.wang at sun.com>
>
>                <mailto:di.wang at sun.com <mailto:di.wang at sun.com>>>> wrote:
>
>                   Hello,
>
>                   You may see bug 17197 and try to apply this patch
>                   https://bugzilla.lustre.org/attachment.cgi?id=25062
>          to your
>                   lustre src. Or you can wait 1.8.2.
>
>                   Thanks
>                   Wangdi
>
>                   Alvaro Aguilera wrote:
>
>                       Hello,
>
>                       as a project for college I'm doing a behavioral
>         comparison
>                       between Lustre and CXFS when dealing with simple
>                strided files
>                       using POSIX semantics. On one of the tests, each
>                participating
>                       process reads 16 chunks of data with a size of 32MB
>                each, from
>                       a common, strided file using the following code:
>
>                            
>         ------------------------------------------------------------------------------------------
>                       int myfile = open("thefile", O_RDONLY);
>
>                       MPI_Barrier(MPI_COMM_WORLD); // the barriers are
>         only
>                to help
>                       measuring time
>
>                       off_t distance = (numtasks-1)*p.buffersize;
>                       off_t offset = rank*p.buffersize;
>
>                       int j;
>                       lseek(myfile, offset, SEEK_SET);
>                       for (j = 0; j < p.buffercount; j++) {
>                             read(myfile, buffers[j], p.buffersize); //
>                buffers are
>                       aligned to the page size
>                             lseek(myfile, distance, SEEK_CUR);
>                       }
>
>                       MPI_Barrier(MPI_COMM_WORLD);
>
>                       close(myfile);
>                            
>         ------------------------------------------------------------------------------------------
>
>                       I'm facing the following problem: when this code
>         is run in
>                       parallel the read operations on certain
>         processes start to
>                       need more and more time to complete. I attached
>         a graphical
>                       trace of this, when using only 2 processes.
>                       As you see, the read operations on process 0
>         stay more
>                or less
>                       constant, taking about 0.12 seconds to complete,
>         while on
>                       process 1 they increase up to 39 seconds!
>
>                       If I run the program with only one process, then
>         the time
>                       stays at ~0.12 seconds per read operation. The
>         problem
>                doesn't
>                       appear if the O_DIRECT flag is used.
>
>                       Can somebody explain to me why is this
>         happening? Since I'm
>                       very new to Lustre, I may be making some silly
>                mistakes, so be
>                       nice to me ;)
>
>                       I'm using Lustre SLES 10 Patchlevel 1, Kernel
>                       2.6.16.54-0.2.5_lustre.1.6.5.1.
>
>
>                       Thanks!
>
>                       Alvaro Aguilera.
>
>
>                            
>         ------------------------------------------------------------------------
>
>                            
>         ------------------------------------------------------------------------
>
>
>
>                       _______________________________________________
>                       Lustre-discuss mailing list
>                       Lustre-discuss at lists.lustre.org
>         <mailto:Lustre-discuss at lists.lustre.org>
>                <mailto:Lustre-discuss at lists.lustre.org
>         <mailto:Lustre-discuss at lists.lustre.org>>
>                       <mailto:Lustre-discuss at lists.lustre.org
>         <mailto:Lustre-discuss at lists.lustre.org>
>                <mailto:Lustre-discuss at lists.lustre.org
>         <mailto:Lustre-discuss at lists.lustre.org>>>
>
>                      
>         http://lists.lustre.org/mailman/listinfo/lustre-discuss
>                      
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>