[lustre-discuss] varying sequential read performance.

Patrick Farrell paf at cray.com
Tue Apr 3 04:41:06 PDT 2018


John,

There’s a simple explanation for that lack of top line performance benefit - you’re not reading 16 GB then 16 GB then 16 GB etc.  It’s interleaved.

Read ahead will do large reads, much larger than your 1 MiB i/o size, so it’s all interleaved from four sources on every actual read operation.

So you’re effectively pulling from all four sources at the same time throughout, so one of them completing faster just means you wait for the others to get their work done.  A similar effect would be more obvious if you had four independent files going in parallel as you’d see that file complete first. This is subtler but it’s the same effect.

- Patrick


________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of John Bauer <bauerj at iodoctors.com>
Sent: Tuesday, April 3, 2018 1:23:30 AM
To: Colin Faber
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] varying sequential read performance.

Colin

Since I do not have root privileges on the system, I do not have access to dropcache.  So, no, I do not flush cache between the dd runs.  The 10 dd runs were done in a single
job submission and the scheduler does dropcache between jobs, so the first of the dd passes does start with a virgin cache.  What strikes me odd about this is the first dd
run is the slowest and obviously must read all the data from the OSSs, which is confirmed by the plot I have added to the top, which indicates the total amount of data moved
via lnet during the life of each dd process.  Notice that the second dd run, which lnetstats indicates also moves the entire 64 GB file from the OSSs, is 3 times faster, and has
to work with a non-virgin cache.  Runs 4 through 10 all move only 48GB via lnet because one of the OSCs keeps its entire 16GB that is needed in cache across all the runs.
Even with the significant advantage that runs 4-10 have, you could never tell in the dd results.  Run 5 is slightly faster than run 2, and run 7 is as slow as run 0.

John


[cid:part1.A7414AA4.17B94BD3 at iodoctors.com]

On 4/3/2018 12:20 AM, Colin Faber wrote:
Are you flushing cache between test runs?

On Mon, Apr 2, 2018, 6:06 PM John Bauer <bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>> wrote:
I am running dd 10 times consecutively to  read a 64GB file ( stripeCount=4 stripeSize=4M ) on a Lustre client(version 2.10.3) that has 64GB of memory.
The client node was dedicated.

for pass in 1 2 3 4 5 6 7 8 9 10
do
   of=/dev/null if=${file} count=128000 bs=512K
done

Instrumentation of the I/O from dd reveals varying performance.  In the plot below, the bottom frame has wall time
on the X axis, and file position of the dd reads on the Y axis, with a dot plotted at the wall time and starting file position of every read.
The slopes of the lines indicate the data transfer rate, which vary from 475MB/s to 1.5GB/s.  The last 2 passes have sharp breaks
in the performance, one with increasing performance, and one with decreasing performance.

The top frame indicates the amount of memory used by each of the file's 4 OSCs over the course of the 10 dd runs.  Nothing terribly odd here except that
one of the OSC's eventually has its entire stripe ( 16GB ) cached and then never gives any up.

I should mention that the file system has 320 OSTs.  I found LU-6370 which eventually started discussing LRU management issues on systems with high
numbers of OST's leading to reduced RPC sizes.

Any explanations for the varying performance?
Thanks,
John

[cid:part1.FE7755F6.C36ADB75 at iodoctors.com]

--
I/O Doctors, LLC
507-766-0378
bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


--
I/O Doctors, LLC
507-766-0378
bauerj at iodoctors.com<mailto:bauerj at iodoctors.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180403/fc706e48/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dkknkjhpjkppjice.png
Type: image/png
Size: 37761 bytes
Desc: dkknkjhpjkppjice.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180403/fc706e48/attachment-0001.png>


More information about the lustre-discuss mailing list