<html><body bgcolor="#FFFFFF"><div>Since Bug 22492 hit a lot of people, it sounds like opencache isn't generally useful unless enabled on every node. Is there an easy way to force files out of the cache (ie, <span class="Apple-style-span" style="font-size: 16px; -webkit-tap-highlight-color: rgba(26, 26, 26, 0.289062); -webkit-composition-fill-color: rgba(175, 192, 227, 0.222656); -webkit-composition-frame-color: rgba(77, 128, 180, 0.222656); font-family: -webkit-monospace; white-space: pre; ">echo 3 > /proc/sys/vm/drop_caches<span class="Apple-style-span" style="font-family: Helvetica; font-size: 17px; white-space: normal; -webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); "><span class="Apple-style-span" style="-webkit-tap-highlight-color: rgba(26, 26, 26, 0.296875); -webkit-composition-fill-color: rgba(175, 192, 227, 0.230469); -webkit-composition-frame-color: rgba(77, 128, 180, 0.230469); ">)?</span></span></span></div><div><br></div><div>Kevin</div><div><br><br>On Aug 3, 2010, at 11:50 AM, Oleg Drokin <<a href="mailto:oleg.drokin@oracle.com">oleg.drokin@oracle.com</a>> wrote:<br><br></div><div></div><blockquote type="cite"><div><span>Hello!</span><br><span></span><br><span>On Aug 3, 2010, at 12:49 PM, Daire Byrne wrote:</span><br><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>So even with the metadata going over NFS the opencache in the client</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>seems to make quite a difference (I'm not sure how much the NFS client</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>caches though). As expected I see no mdt activity for the NFS export</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>once cached. I think it would be really nice to be able to enable the</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>opencache on any lustre client. A couple of potential workloads that I</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>A simple workaround for you to enable opencache on a specific client would</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>be to add cr_flags |= MDS_OPEN_LOCK; in mdc/mdc_lib.c:mds_pack_open_flags()</span><br></blockquote></blockquote><blockquote type="cite"><span>Yea that works - cheers. FYI some comparisons with a simple find on a</span><br></blockquote><blockquote type="cite"><span>remote client (~33,000 files):</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span> find /mnt/lustre (not cached) = 41 secs</span><br></blockquote><blockquote type="cite"><span> find /mnt/lustre (cached) = 19 secs</span><br></blockquote><blockquote type="cite"><span> find /mnt/lustre (opencache) = 3 secs</span><br></blockquote><span></span><br><span>Hm, initially I was going to say that find is not open-intensive so it should</span><br><span>not benefit from opencache at all.</span><br><span>But then I realized if you have a lot of dirs, then indeed there would be a</span><br><span>positive impact on subsequent reruns.</span><br><span>I assume that the opencache result is a second run and first run produces</span><br><span>same 41 seconds?</span><br><span></span><br><span>BTW, another unintended side-effect you might experience if you have mixed</span><br><span>opencache enabled/disabled network is if you run something (or open for write)</span><br><span>on an opencache-enabled client, you might have problems writing (or executing)</span><br><span>that file from non-opencache enabled nodes as long as the file handle</span><br><span>would remain cached on the client. This is because if open lock was not requested,</span><br><span>we don't try to invalidate current ones (expensive) and MDS would think</span><br><span>the file is genuinely open for write/execution and disallow conflicting accesses</span><br><span>with EBUSY.</span><br><span></span><br><blockquote type="cite"><span>performance when compared to something simpler like NFS. Slightly off</span><br></blockquote><blockquote type="cite"><span>topic (and I've kinda asked this before) but is there a good reason</span><br></blockquote><blockquote type="cite"><span>why link() speeds in Lustre are so slow compare to something like NFS?</span><br></blockquote><blockquote type="cite"><span>A quick comparison of doing a "cp -al" from a remote Lustre client and</span><br></blockquote><blockquote type="cite"><span>an NFS client (to a fast NFS server):</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span> cp -fa /mnt/lustre/blah /mnt/lustre/blah2 = ~362 files/sec</span><br></blockquote><blockquote type="cite"><span> cp -fa /mnt/nfs/blah /mnt/nfs/blah2 = ~1863 files/sec</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Is it just the extra depth of the lustre stack/code path? Is there</span><br></blockquote><blockquote type="cite"><span>anything we could do to speed this up if we know that no other client</span><br></blockquote><blockquote type="cite"><span>will touch these dirs while we hardlink them?</span><br></blockquote><span></span><br><span>Hm, this is a first complaint about this that I hear.</span><br><span>I just looked into strace of cp -fal (which I guess you mant instead of just -fa that</span><br><span>would just copy everything).</span><br><span></span><br><span>so we traverse the tree down creating a dir structure in parallel first (or just doing it in readdir order)</span><br><span></span><br><span>open("/mnt/lustre/a/b/c/d/e/f", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3</span><br><span>+1 RPC</span><br><span></span><br><span>fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</span><br><span>+1 RPC (if no opencache)</span><br><span></span><br><span>fcntl(3, F_SETFD, FD_CLOEXEC)           = 0</span><br><span>getdents(3, /* 4 entries */, 4096)      = 96</span><br><span>getdents(3, /* 0 entries */, 4096)      = 0</span><br><span>+1 RPC</span><br><span></span><br><span>close(3)                                = 0</span><br><span>+1 RPC (if no opencache)</span><br><span></span><br><span>lstat("/mnt/lustre/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</span><br><span>(should be cached, so no RPC)</span><br><span></span><br><span>mkdir("/mnt/lustre/blah2/b/c/d/e/f/g", 040755) = 0</span><br><span>+1 RPC</span><br><span></span><br><span>lstat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</span><br><span>+1 RPC</span><br><span></span><br><span>stat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</span><br><span>(should be cached, so no RPC)</span><br><span></span><br><span>Then we get to files:</span><br><span>link("/mnt/lustre/a/b/c/d/e/f/g/k/8", "/mnt/lustre/blah2/b/c/d/e/f/g/k/8") = 0</span><br><span>+1 RPC</span><br><span></span><br><span>futimesat(AT_FDCWD, "/mnt/lustre/blah2/b/c/d/e/f/g/k", {{1280856246, 0}, {128085</span><br><span>6291, 0}}) = 0</span><br><span>+1 RPC</span><br><span></span><br><span>then we start traversing the just created tree up and chowning it:</span><br><span>chown("/mnt/lustre/blah2/b/c/d/e/f/g/k", 0, 0) = 0</span><br><span>+1 RPC </span><br><span></span><br><span>getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_access", 0x7fff519f0950, 132) = -1 ENODATA (No data available)</span><br><span>+1 RPC</span><br><span></span><br><span>stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0</span><br><span>(not sure why another stat here, we already did it on the way up. Should be cached)</span><br><span></span><br><span>setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_access", "\x02\x00</span><br><span>\x00\x00\x01\x00\x07\x00\xff\xff\xff\xff\x04\x00\x05\x00\xff\xff\xff\xff \x00\x0</span><br><span>5\x00\xff\xff\xff\xff", 28, 0) = 0</span><br><span>+1 RPC</span><br><span></span><br><span>getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_default", 0x7fff519f09</span><br><span>50, 132) = -1 ENODATA (No data available)</span><br><span>+1 RPC</span><br><span></span><br><span>stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, ...}) =</span><br><span> 0</span><br><span>Hm, stat again? did not we do it a few syscalls back?</span><br><span></span><br><span>stat("/mnt/lustre/blah2/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, ...</span><br><span>}) = 0</span><br><span>stat of the target. +1 RPC (the cache got invalidated by link above).</span><br><span></span><br><span>setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_default", "\x02\x0</span><br><span>0\x00\x00", 4, 0) = 0</span><br><span>+1 RPC</span><br><span></span><br><span></span><br><span>So I guess there is a certain number of stat RPCs that would not be present on NFS</span><br><span>due to different ways the caching works, plus all the getxattrs. Not sure if this</span><br><span>is enough to explain 4x rate difference.</span><br><span></span><br><span>Also you can try disabling debug (if you did not already) to see how big of an impact</span><br><span>that would make. It used to be that debug was affecting metadata loads a lot, though</span><br><span>with recent debug levels adjustments I think it was somewhat improved.</span><br><span></span><br><span>Bye,</span><br><span>    Oleg</span><br><span>_______________________________________________</span><br><span>Lustre-discuss mailing list</span><br><span><a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a></span><br><span><a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a></span><br></div></blockquote></body></html>