<br><div class="gmail_quote">On Tue, Aug 3, 2010 at 12:49 PM, Daire Byrne <span dir="ltr"><<a href="mailto:daire.byrne@gmail.com">daire.byrne@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Oleg,<br>

<div class="im"><br>

On Tue, Aug 3, 2010 at 5:21 AM, Oleg Drokin <<a href="mailto:oleg.drokin@oracle.com">oleg.drokin@oracle.com</a>> wrote:<br>

>> So even with the metadata going over NFS the opencache in the client<br>

>> seems to make quite a difference (I'm not sure how much the NFS client<br>

>> caches though). As expected I see no mdt activity for the NFS export<br>

>> once cached. I think it would be really nice to be able to enable the<br>

>> opencache on any lustre client. A couple of potential workloads that I<br>

><br>

> A simple workaround for you to enable opencache on a specific client would<br>

> be to add cr_flags |= MDS_OPEN_LOCK; in mdc/mdc_lib.c:mds_pack_open_flags()<br>

<br>

</div>Yea that works - cheers. FYI some comparisons with a simple find on a<br>

remote client (~33,000 files):<br>

<br></blockquote><div><br>That's not to bad even in the uncached case, what kind of round-trip-time delay do you have between your client and servers?<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


  find /mnt/lustre (not cached) = 41 secs<br>

  find /mnt/lustre (cached) = 19 secs<br>

  find /mnt/lustre (opencache) = 3 secs<br>

<br>

The "ls -lR" case is still having to query the MDS a lot (for<br>

getxattr) which becomes quite noticeable in the WAN case. Apparently<br>

the 1.8.4 client already addresses this (#15587?). I might try that<br>

patch too...<br></blockquote><div><br>The patch for bug 15587 addresses problems with SLES 11 (maybe others?) patchless clients with CONFIG_FILE_SECURITY_CAPABILITIES enabled.  It severely affects performance over a WAN (see bug 21439) because their is no xattr caching each write is requiring an RPC to the MDS and you won't see any parallelism.<br>

<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im"><br>

> I guess we really need to have an option for this, but I am not sure<br>

> if we want it on the client, server, or both.<br>

<br></div></blockquote><div><br>It would be nice to have as options for both to allow WAN users to see the benefit without patching code.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im">

</div>Doing it client side with the minor modification you suggest is<br>

probably enough for our purposes for the time being. Thanks.<br>

<div class="im"><br>

>> can think of that would benefit are WAN clients and clients that need<br>

>> to do mainly metadata (e.g. scanning the filesystem, rsync --link-dest<br>

>> hardlink snapshot backups). For the WAN case I'd be quite interested<br>

><br>

> Open is very narrow metadata case, so if you do metadata but no opens you would<br>

> get zero benefit from open cache.<br>

<br>

</div>I suppose the recursive scan case is a fairly low frequency operation<br>

but is also one that Lustre has always suffered noticeably worse<br>

performance when compared to something simpler like NFS. Slightly off<br>

topic (and I've kinda asked this before) but is there a good reason<br>

why link() speeds in Lustre are so slow compare to something like NFS?<br>

A quick comparison of doing a "cp -al" from a remote Lustre client and<br>

an NFS client (to a fast NFS server):<br></blockquote><div><br>Another consideration for WAN performance when creating files is the stripe count.  When you start writing to a file the first RPC to each OSC requests the lock rather then requesting the lock from all OSCs when the first lock is requested.  This would require some code to change but it would be another nice optimization for WAN performance.  We do some work over a 200 ms RTT latency and to create a file striped across 8 OSTs with 1 MB stripes takes 1.6 seconds just to write the first 8 MBs sine the locks are synchronous operations.  For a single stripe it would only take ~200 ms.<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

  cp -fa /mnt/lustre/blah /mnt/lustre/blah2 = ~362 files/sec<br>

  cp -fa /mnt/nfs/blah /mnt/nfs/blah2 = ~1863 files/sec<br>

<br>

Is it just the extra depth of the lustre stack/code path? Is there<br>

anything we could do to speed this up if we know that no other client<br>

will touch these dirs while we hardlink them?<br>

<div class="im"><br>

> Also getting this extra lock puts some extra cpu load on MDS, but if we go this far,<br>

> we can then somewhat simplify rep-ack and hold it for much shorter time in<br>

> a lot of cases which would greatly help WAN workloads that happen to create<br>

> files in same dir from many nodes, for example. (see bug 20373, first patch)<br>

> Just be aware that testing with more than 16000 clients at ORNL clearly shows<br>

> degradations at LAN latencies.<br>

<br>

</div>Understood. I think we are a long way off hitting those kinds of<br>

limits. The WAN case is interesting because it is the interactive<br>

speed of browsing the filesystem that is usually the most noticeable<br>

(and annoying) artefact of being many miles away from the server. Once<br>

you start accessing the files you want then you are reasonably happy<br>

to be limited by your connection's overall bandwidth.<br></blockquote><div><br>I agree, there is a lot of things I'd like to see added to improve that interactive WAN performance.  There is a general metadata WAN performance, bug 18526.  I think readdir+ would help and larger RPCs to the MDS (bug 17833) are necessary to overcome the transactional nature Lustre stat/getxattr have today.  As for statahead helping "ls -l" performance I have some numbers in my LUG2010 presentation (<a href="http://wiki.lustre.org/images/6/60/LUG2010_Filizetti_SMSi.pdf">http://wiki.lustre.org/images/6/60/LUG2010_Filizetti_SMSi.pdf</a>) about the improvements that size on metadata (SOM) adds compared to Lustre 1.8.<br>

<br>Jeremy<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Thanks for the feedback,<br>

<font color="#888888"><br>

Daire<br>

</font><div><div></div><div class="h5">_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

</div></div></blockquote></div><br>