<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

</head>

<body>

John,<br>

<br>

There’s a simple explanation for that lack of top line performance benefit - you’re not reading 16 GB then 16 GB then 16 GB etc.  It’s interleaved.<br>

<br>

Read ahead will do large reads, much larger than your 1 MiB i/o size, so it’s all interleaved from four sources on every actual read operation.<br>

<br>

So you’re effectively pulling from all four sources at the same time throughout, so one of them completing faster just means you wait for the others to get their work done.  A similar effect would be more obvious if you had four independent files going in parallel

 as you’d see that file complete first. This is subtler but it’s the same effect.<br>

<br>

- Patrick<br>

<br>

<br>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> lustre-discuss <lustre-discuss-bounces@lists.lustre.org> on behalf of John Bauer <bauerj@iodoctors.com><br>

<b>Sent:</b> Tuesday, April 3, 2018 1:23:30 AM<br>

<b>To:</b> Colin Faber<br>

<b>Cc:</b> lustre-discuss@lists.lustre.org<br>

<b>Subject:</b> Re: [lustre-discuss] varying sequential read performance.</font>

<div> </div>

</div>

<div style="background-color:#FFFFFF">Colin<br>

<br>

Since I do not have root privileges on the system, I do not have access to dropcache.  So, no, I do not flush cache between the dd runs.  The 10 dd runs were done in a single<br>

job submission and the scheduler does dropcache between jobs, so the first of the dd passes does start with a virgin cache.  What strikes me odd about this is the first dd<br>

run is the slowest and obviously must read all the data from the OSSs, which is confirmed by the plot I have added to the top, which indicates the total amount of data moved<br>

via lnet during the life of each dd process.  Notice that the second dd run, which lnetstats indicates also moves the entire 64 GB file from the OSSs, is 3 times faster, and has<br>

to work with a non-virgin cache.  Runs 4 through 10 all move only 48GB via lnet because one of the OSCs keeps its entire 16GB that is needed in cache across all the runs.<br>

Even with the significant advantage that runs 4-10 have, you could never tell in the dd results.  Run 5 is slightly faster than run 2, and run 7 is as slow as run 0.<br>

<br>

John<br>

<br>

<br>

<img alt="" src="cid:part1.A7414AA4.17B94BD3@iodoctors.com"><br>

<br>

<div class="x_moz-cite-prefix">On 4/3/2018 12:20 AM, Colin Faber wrote:<br>

</div>

<blockquote type="cite">

<div dir="auto">Are you flushing cache between test runs?</div>

<br>

<div class="x_gmail_quote">

<div dir="ltr">On Mon, Apr 2, 2018, 6:06 PM John Bauer <<a href="mailto:bauerj@iodoctors.com">bauerj@iodoctors.com</a>> wrote:<br>

</div>

<blockquote class="x_gmail_quote" style="margin:0 0 0

          .8ex; border-left:1px #ccc solid; padding-left:1ex">

<div bgcolor="#FFFFFF">I am running dd 10 times consecutively to  read a 64GB file ( stripeCount=4 stripeSize=4M ) on a Lustre client(version 2.10.3) that has 64GB of memory.<br>

The client node was dedicated.<br>

<br>

<b><font face="Courier New, Courier, monospace">for pass in 1 2 3 4 5 6 7 8 9 10<br>

do<br>

   of=/dev/null if=${file} count=128000 bs=512K<br>

done<br>

</font></b><br>

Instrumentation of the I/O from dd reveals varying performance.  In the plot below, the bottom frame has wall time<br>

on the X axis, and file position of the dd reads on the Y axis, with a dot plotted at the wall time and starting file position of every read. 

<br>

The slopes of the lines indicate the data transfer rate, which vary from 475MB/s to 1.5GB/s.  The last 2 passes have sharp breaks<br>

in the performance, one with increasing performance, and one with decreasing performance.<br>

<br>

The top frame indicates the amount of memory used by each of the file's 4 OSCs over the course of the 10 dd runs.  Nothing terribly odd here except that<br>

one of the OSC's eventually has its entire stripe ( 16GB ) cached and then never gives any up.<br>

<br>

I should mention that the file system has 320 OSTs.  I found LU-6370 which eventually started discussing LRU management issues on systems with high<br>

numbers of OST's leading to reduced RPC sizes.<br>

<br>

Any explanations for the varying performance?<br>

Thanks, <br>

John<br>

<br>

<img alt="" src="cid:part1.FE7755F6.C36ADB75@iodoctors.com">

<pre class="x_m_179390239102776222moz-signature" cols="72">-- 

I/O Doctors, LLC

507-766-0378

<a class="x_m_179390239102776222moz-txt-link-abbreviated" href="mailto:bauerj@iodoctors.com" target="_blank" rel="noreferrer">bauerj@iodoctors.com</a></pre>

</div>

_______________________________________________<br>

lustre-discuss mailing list<br>

<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank" rel="noreferrer">lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>

</blockquote>

</div>

</blockquote>

<br>

<pre class="x_moz-signature" cols="72">-- 

I/O Doctors, LLC

507-766-0378

<a class="x_moz-txt-link-abbreviated" href="mailto:bauerj@iodoctors.com">bauerj@iodoctors.com</a></pre>

</div>

</body>

</html>