[Lustre-discuss] Bad read performance
di wang
di.wang at sun.com
Thu Sep 3 17:23:58 PDT 2009
Hello,
Miss_inside_window vs hit is about 3 vs 2, indeed too high. It probably
means a lot of pages is read in by read-ahead, but later evicted before
it is really being accessed.
So the patch in bug17197 probably fix this problem, and which will be
included in 1.8.2.
Thanks
WangDi
Alvaro Aguilera wrote:
> hi,
>
> here is the requested information:
>
> before test:
>
> llite.fastfs-ffff810102a6a400.read_ahead_stats=
> snapshot_time: 1251851453.382275 (secs.usecs)
> pending issued pages: 0
> hits 7301235
> misses 10546
> readpage not consecutive 14369
> miss inside window 1
> failed grab_cache_page 6285314
> failed lock match 0
> read but discarded 98955
> zero length file 0
> zero size window 3495
> read-ahead to EOF 172
> hit max r-a issue 783042
> wrong page from grab_cache_page 0
>
>
> after:
>
> llite.fastfs-ffff810102a6a400.read_ahead_stats=
> snapshot_time: 1251851620.183964 (secs.usecs)
> pending issued pages: 0
> hits 7506005
> misses 330064
> readpage not consecutive 14432
> miss inside window 319450
> failed grab_cache_page 6322954
> failed lock match 17294
> read but discarded 98955
> zero length file 0
> zero size window 3495
> read-ahead to EOF 192
> hit max r-a issue 837908
> wrong page from grab_cache_page 0
>
>
> there seems to by a lot of misses, as well as a locking problem,
> doesn't it? Btw. in the test, 4 processes read 512mb each from a 2gb
> big file.
>
> Regards,
> Alvaro.
>
> On Fri, Aug 21, 2009 at 3:38 PM, di wang <di.wang at sun.com
> <mailto:di.wang at sun.com>> wrote:
>
> hello,
>
> Alvaro Aguilera wrote:
>
> they run on different physical nodes and access the ost via 4x
> infiniband.
>
> I never heard such problems, if they on different nodes. Client
> memory?
> Can you post read-ahead stats (before and after the test) here by
>
> lctl get_param llite.*.read_ahead_stats
>
>
> But there are indeed a lot fixes about stride read since 1.6.5,
> which is included in the tar ball I posted below.
> And it probably can fix your problem.
>
> Thanks
> WangDi
>
> On Fri, Aug 21, 2009 at 3:15 PM, di wang <di.wang at sun.com
> <mailto:di.wang at sun.com> <mailto:di.wang at sun.com
> <mailto:di.wang at sun.com>>> wrote:
>
> Alvaro Aguilera wrote:
>
> thanks for the hint, but unfortunately I can't make any
> updates to the cluster...
>
> Do you think both of the problems I experienced are bugs in
> Lustre and are resolved in current versions?
>
> It should be lustre bugs. The 2 processes runs on different
> node
> or same node?
>
> Thanks
> WangDi
>
>
> Thanks.
> Alvaro.
>
>
> On Fri, Aug 21, 2009 at 6:32 AM, di wang
> <di.wang at sun.com <mailto:di.wang at sun.com>
> <mailto:di.wang at sun.com <mailto:di.wang at sun.com>>
> <mailto:di.wang at sun.com <mailto:di.wang at sun.com>
>
> <mailto:di.wang at sun.com <mailto:di.wang at sun.com>>>> wrote:
>
> Hello,
>
> You may see bug 17197 and try to apply this patch
> https://bugzilla.lustre.org/attachment.cgi?id=25062
> to your
> lustre src. Or you can wait 1.8.2.
>
> Thanks
> Wangdi
>
> Alvaro Aguilera wrote:
>
> Hello,
>
> as a project for college I'm doing a behavioral
> comparison
> between Lustre and CXFS when dealing with simple
> strided files
> using POSIX semantics. On one of the tests, each
> participating
> process reads 16 chunks of data with a size of 32MB
> each, from
> a common, strided file using the following code:
>
>
> ------------------------------------------------------------------------------------------
> int myfile = open("thefile", O_RDONLY);
>
> MPI_Barrier(MPI_COMM_WORLD); // the barriers are
> only
> to help
> measuring time
>
> off_t distance = (numtasks-1)*p.buffersize;
> off_t offset = rank*p.buffersize;
>
> int j;
> lseek(myfile, offset, SEEK_SET);
> for (j = 0; j < p.buffercount; j++) {
> read(myfile, buffers[j], p.buffersize); //
> buffers are
> aligned to the page size
> lseek(myfile, distance, SEEK_CUR);
> }
>
> MPI_Barrier(MPI_COMM_WORLD);
>
> close(myfile);
>
> ------------------------------------------------------------------------------------------
>
> I'm facing the following problem: when this code
> is run in
> parallel the read operations on certain
> processes start to
> need more and more time to complete. I attached
> a graphical
> trace of this, when using only 2 processes.
> As you see, the read operations on process 0
> stay more
> or less
> constant, taking about 0.12 seconds to complete,
> while on
> process 1 they increase up to 39 seconds!
>
> If I run the program with only one process, then
> the time
> stays at ~0.12 seconds per read operation. The
> problem
> doesn't
> appear if the O_DIRECT flag is used.
>
> Can somebody explain to me why is this
> happening? Since I'm
> very new to Lustre, I may be making some silly
> mistakes, so be
> nice to me ;)
>
> I'm using Lustre SLES 10 Patchlevel 1, Kernel
> 2.6.16.54-0.2.5_lustre.1.6.5.1.
>
>
> Thanks!
>
> Alvaro Aguilera.
>
>
>
> ------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> <mailto:Lustre-discuss at lists.lustre.org>
> <mailto:Lustre-discuss at lists.lustre.org
> <mailto:Lustre-discuss at lists.lustre.org>>
> <mailto:Lustre-discuss at lists.lustre.org
> <mailto:Lustre-discuss at lists.lustre.org>
> <mailto:Lustre-discuss at lists.lustre.org
> <mailto:Lustre-discuss at lists.lustre.org>>>
>
>
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list