[Lustre-discuss] Client directory entry caching
Oleg Drokin
oleg.drokin at oracle.com
Wed Aug 4 11:23:37 PDT 2010
Hello!
On Aug 4, 2010, at 2:04 PM, Daire Byrne wrote:
>> Hm, initially I was going to say that find is not open-intensive so it should
>> not benefit from opencache at all.
>> But then I realized if you have a lot of dirs, then indeed there would be a
>> positive impact on subsequent reruns.
>> I assume that the opencache result is a second run and first run produces
>> same 41 seconds?
> Actually I assumed it would be but I guess there must be some repeat
> opens because the 1st run with opencache is actually better. I have
open followed by stat would also benefit from opencache by removing one RPC for stat.
>
> syscall lustre nfs
> --------------------------
> stat 7s 0.01s
> lstat 36s 7s
> link 29s 16s
> getxattr 5s 0.29s
> setxattr 30s 0.25s
> open 1s 2s
> mkdir 6s 3s
> lchown 11s 2s
> futimesat 11s 2s
Hm. That's interesting. And this is over a high latency link, is it?
Was this also with debug disabled?
I don't think lstat is any much different than stat if the target is not
symlink.
I wonder if most of the difference with lstat comes from the fact that for us
lstat is rpc (mostly used after opens or readdirs plus fetches attrs from OSTs too)
where as for NFS not only they cache data, their readdirplus is better than statahead
because it fetches all file info including size and times, where as statahead
confusingly does not caches stat information, only what is available on MDS.
I had a stab at patch to fetch OST data in parallel too, but that turned out to be
not all that trivial and never worked completely correctly. Might be I need
to take another look at it after Johann revamps request sets logic a bit to make
adding requests to sets easier.
> It doesn't quite explain the 4:1 speed difference but the (l)stat
> heavy "cp -la" is consistently that much faster on NFS. Is the NFS
> server so much faster for get/setxattr because it returns "EOPNOTSUPP"
> for setxattr? Can we do something similar for the Lustre client if we
If it does return EOPNOTSUPP on the client side then there is no RPC and
the reply is instant. For lustre it is an RPC roundtrip which is not exactly
cheap.
> don't care about extended attributes? The link() times are still
> almost twice as slow on Lustre though - that may be related to a
> slowish (test) MDT disk. Like Andreas said I don't understand why
There is some more work for link in case of lustre like rep-ack (extra confirmation
from client to server that it got the link reply), same with mkdir.
I am not sure why such a big difference with chown and time update, though
actually I now realise we need to talk to OSTs to update ownership and times there as well
which adds up even though it should be sent in parallel.
> there is an setxattr RPC when we didn't get any data from getxattr but
> that is probably more down to "cp" than lustre?
Yes, I think this is more about cp, you can see nfs also has setxattr attempts.
Bye,
Oleg
More information about the lustre-discuss
mailing list