[Lustre-discuss] Client directory entry caching

Tue Aug 3 19:59:54 PDT 2010

On Tue, Aug 3, 2010 at 12:49 PM, Daire Byrne <daire.byrne at gmail.com> wrote:

> Oleg,
>
> On Tue, Aug 3, 2010 at 5:21 AM, Oleg Drokin <oleg.drokin at oracle.com>
> wrote:
> >> So even with the metadata going over NFS the opencache in the client
> >> seems to make quite a difference (I'm not sure how much the NFS client
> >> caches though). As expected I see no mdt activity for the NFS export
> >> once cached. I think it would be really nice to be able to enable the
> >> opencache on any lustre client. A couple of potential workloads that I
> >
> > A simple workaround for you to enable opencache on a specific client
> would
> > be to add cr_flags |= MDS_OPEN_LOCK; in
> mdc/mdc_lib.c:mds_pack_open_flags()
>
> Yea that works - cheers. FYI some comparisons with a simple find on a
> remote client (~33,000 files):
>
>
That's not to bad even in the uncached case, what kind of round-trip-time
delay do you have between your client and servers?

>  find /mnt/lustre (not cached) = 41 secs
>  find /mnt/lustre (cached) = 19 secs
>  find /mnt/lustre (opencache) = 3 secs
>
> The "ls -lR" case is still having to query the MDS a lot (for
> getxattr) which becomes quite noticeable in the WAN case. Apparently
> the 1.8.4 client already addresses this (#15587?). I might try that
> patch too...
>

The patch for bug 15587 addresses problems with SLES 11 (maybe others?)
patchless clients with CONFIG_FILE_SECURITY_CAPABILITIES enabled.  It
severely affects performance over a WAN (see bug 21439) because their is no
xattr caching each write is requiring an RPC to the MDS and you won't see
any parallelism.

> > I guess we really need to have an option for this, but I am not sure
> > if we want it on the client, server, or both.
>
>
It would be nice to have as options for both to allow WAN users to see the
benefit without patching code.

> Doing it client side with the minor modification you suggest is
> probably enough for our purposes for the time being. Thanks.
>
> >> can think of that would benefit are WAN clients and clients that need
> >> to do mainly metadata (e.g. scanning the filesystem, rsync --link-dest
> >> hardlink snapshot backups). For the WAN case I'd be quite interested
> >
> > Open is very narrow metadata case, so if you do metadata but no opens you
> would
> > get zero benefit from open cache.
>
> I suppose the recursive scan case is a fairly low frequency operation
> but is also one that Lustre has always suffered noticeably worse
> performance when compared to something simpler like NFS. Slightly off
> topic (and I've kinda asked this before) but is there a good reason
> why link() speeds in Lustre are so slow compare to something like NFS?
> A quick comparison of doing a "cp -al" from a remote Lustre client and
> an NFS client (to a fast NFS server):
>

Another consideration for WAN performance when creating files is the stripe
count.  When you start writing to a file the first RPC to each OSC requests
the lock rather then requesting the lock from all OSCs when the first lock
is requested.  This would require some code to change but it would be
another nice optimization for WAN performance.  We do some work over a 200
ms RTT latency and to create a file striped across 8 OSTs with 1 MB stripes
takes 1.6 seconds just to write the first 8 MBs sine the locks are
synchronous operations.  For a single stripe it would only take ~200 ms.

>
>  cp -fa /mnt/lustre/blah /mnt/lustre/blah2 = ~362 files/sec
>  cp -fa /mnt/nfs/blah /mnt/nfs/blah2 = ~1863 files/sec
>
> Is it just the extra depth of the lustre stack/code path? Is there
> anything we could do to speed this up if we know that no other client
> will touch these dirs while we hardlink them?
>
> > Also getting this extra lock puts some extra cpu load on MDS, but if we
> go this far,
> > we can then somewhat simplify rep-ack and hold it for much shorter time
> in
> > a lot of cases which would greatly help WAN workloads that happen to
> create
> > files in same dir from many nodes, for example. (see bug 20373, first
> patch)
> > Just be aware that testing with more than 16000 clients at ORNL clearly
> shows
> > degradations at LAN latencies.
>
> Understood. I think we are a long way off hitting those kinds of
> limits. The WAN case is interesting because it is the interactive
> speed of browsing the filesystem that is usually the most noticeable
> (and annoying) artefact of being many miles away from the server. Once
> you start accessing the files you want then you are reasonably happy
> to be limited by your connection's overall bandwidth.
>

I agree, there is a lot of things I'd like to see added to improve that
interactive WAN performance.  There is a general metadata WAN performance,
bug 18526.  I think readdir+ would help and larger RPCs to the MDS (bug
17833) are necessary to overcome the transactional nature Lustre
stat/getxattr have today.  As for statahead helping "ls -l" performance I
have some numbers in my LUG2010 presentation (
http://wiki.lustre.org/images/6/60/LUG2010_Filizetti_SMSi.pdf) about the
improvements that size on metadata (SOM) adds compared to Lustre 1.8.

Jeremy

>
> Thanks for the feedback,
>
> Daire
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100803/63b4ee61/attachment.htm>