[Lustre-discuss] Client directory entry caching

Oleg Drokin oleg.drokin at oracle.com
Wed Aug 4 11:23:37 PDT 2010


Hello!

On Aug 4, 2010, at 2:04 PM, Daire Byrne wrote:
>> Hm, initially I was going to say that find is not open-intensive so it should
>> not benefit from opencache at all.
>> But then I realized if you have a lot of dirs, then indeed there would be a
>> positive impact on subsequent reruns.
>> I assume that the opencache result is a second run and first run produces
>> same 41 seconds?
> Actually I assumed it would be but I guess there must be some repeat
> opens because the 1st run with opencache is actually better. I have

open followed by stat would also benefit from opencache by removing one RPC for stat.

> 
> syscall   lustre nfs
> --------------------------
> stat        7s    0.01s
> lstat      36s    7s
> link       29s    16s
> getxattr   5s    0.29s
> setxattr  30s    0.25s
> open       1s    2s
> mkdir      6s    3s
> lchown    11s    2s
> futimesat 11s    2s

Hm. That's interesting. And this is over a high latency link, is it?
Was this also with debug disabled?
I don't think lstat is any much different than stat if the target is not
symlink.
I wonder if most of the difference with lstat comes from the fact that for us
lstat is rpc (mostly used after opens or readdirs plus fetches attrs from OSTs too)
where as for NFS not only they cache data, their readdirplus is better than statahead
because it fetches all file info including size and times, where as statahead
confusingly does not caches stat information, only what is available on MDS.

I had a stab at patch to fetch OST data in parallel too, but that turned out to be
not all that trivial and never worked completely correctly. Might be I need
to take another look at it after Johann revamps request sets logic a bit to make
adding requests to sets easier.

> It doesn't quite explain the 4:1 speed difference but the (l)stat
> heavy "cp -la" is consistently that much faster on NFS. Is the NFS
> server so much faster for get/setxattr because it returns "EOPNOTSUPP"
> for setxattr? Can we do something similar for the Lustre client if we

If it does return EOPNOTSUPP on the client side then there is no RPC and
the reply is instant. For lustre it is an RPC roundtrip which is not exactly
cheap.

> don't care about extended attributes? The link() times are still
> almost twice as slow on Lustre though - that may be related to a
> slowish (test) MDT disk.  Like Andreas said I don't understand why

There is some more work for link in case of lustre like rep-ack (extra confirmation
from client to server that it got the link reply), same with mkdir.

I am not sure why such a big difference with chown and time update, though
actually I now realise we need to talk to OSTs to update ownership and times there as well
which adds up even though it should be sent in parallel.

> there is an setxattr RPC when we didn't get any data from getxattr but
> that is probably more down to "cp" than lustre?

Yes, I think this is more about cp, you can see nfs also has setxattr attempts.

Bye,
    Oleg


More information about the lustre-discuss mailing list