Or would it be better to increase the stripe count for my lustre filesystem to the max number of OST's?<br><br><div class="gmail_quote">On Wed, Mar 3, 2010 at 3:27 PM, Jagga Soorma <span dir="ltr"><<a href="mailto:jagga13@gmail.com">jagga13@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">On Wed, Mar 3, 2010 at 2:30 PM, Andreas Dilger <span dir="ltr"><<a href="mailto:adilger@sun.com" target="_blank">adilger@sun.com</a>></span> wrote:<br>

</div><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div>On 2010-03-03, at 12:50, Jagga Soorma wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I have just deployed a new Lustre FS with 2 MDS servers, 2 active OSS servers (5x2TB OST's per OSS) and 16 compute nodes.<br>

</blockquote>

<br></div>

Does this mean you are using 5 2TB disks in a single RAID-5 OST per OSS (i.e. total OST size is 8TB), or are you using 5 separate 2TB OSTs?</blockquote></div><div><br>No I am using 5 independent 2TB OST's per OSS.<br>

 <br>

</div><div class="im"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Attached are our findings from the iozone tests and it looks like the iozone throughput tests have demonstrated almost linear scalability of Lustre except for when WRITING files that exceed 128MB in size.  When multiple clients create/write files larger than 128MB, Lustre throughput levels up to approximately ~1GB/s. This behavior has been observed with almost all tested block size ranges except for 4KB.  I don't have any explanation as to why Lustre performs poorly when writing large files.<br>


<br>

Has anyoned experienced this behaviour?  Any comments on our findings?<br>

</blockquote>

<br>

<br></div>

The default client tunable max_dirty_mb=32MB per OSC (i.e. the maximum amount of unwritten dirty data per OST before blocking the process submitting IO).  If you have 2 OST/OSCs and you have a stripe count of 2 then you can cache up to 64MB on the client without having to wait for any RPCs to complete.  That is why you see a performance cliff for writes beyond 32MB.<br>


</blockquote></div><div><br>So the true write performance should be measured for data captured for files larger than 128MB?  If we do see a large number of large files being created on the lustre fs, is this something that can be tuned on the client side?  If so, where/how can I get this done and what would be the recommended settings? <br>


</div><div class="im"><div> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

It should be clear that the read graphs are meaningless, due to local cache of the file.  I'd hazard a guess that you are not getting 100GB/s from 2 OSS nodes.<br></blockquote></div><div><br>Agreed.  Is there a way to find out the size of the local cache on the clients?<br>


 <br></div><div class="im"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Also, what is the interconnect on the client?  If you are using a single 10GigE then 1GB/s is as fast as you can possibly write large files to the OSTs, regardless of the striping.<br></blockquote></div><div><br>I am using Infiniband (QDR) interconnects for all nodes.<br>


 </div><div class="im"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Cheers, Andreas<br><font color="#888888">

--<br>

Andreas Dilger<br>

Sr. Staff Engineer, Lustre Group<br>

Sun Microsystems of Canada, Inc.<br>

<br>

</font></blockquote></div></div><br>

</blockquote></div><br>