[lustre-discuss] Lustre caching behavior
Patrick Farrell
pfarrell at ddn.com
Tue Mar 24 16:42:53 PDT 2026
John,
Are you able to pin this down into more of a reproducer? Even just a more granular description.
I’m curious to explore it - this is poor behavior, not desirable for sure. I’m curious in particular to see about the lock cancellation - my understanding had been the glimpse request to read lock path was entirely opportunistic (NONBLOCKING in ldlm speak) - and would never cause a cancel (ie, my understanding doesn’t accord with Andreas’s). I was pretty sure about that.
But this behavior is suggestive that I’ve got that wrong or there’s something else weird.
Patrick
________________________________
From: John Bauer <bauerj at iodoctors.com>
Sent: Tuesday, March 24, 2026 6:12 PM
To: Oleg Drokin <green at whamcloud.com>
Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>; adilger at thelustrecollective.com <adilger at thelustrecollective.com>; Patrick Farrell <pfarrell at ddn.com>
Subject: Re: [lustre-discuss] Lustre caching behavior
Oleg,
I recreated the plots with a shared time axis. In your comments you mention the "writer node". Note that all the reading and writing was done on a single compute node.
Andreas,
Concerning your comments, this is really troubling when considering benchmarking. I first got a hint of this while benchmarking on a system that uses quotas. I suspect the quota daemon periodically walks the file system doing stat()s. One of my runs had significantly better performance and I now think it was because my directory was hit by a stat(). The performance improvement I saw was a one-off. Could quotas cause this? Benchmarkers beware, doing an ls -l in your benchmarking directory could significantly alter performance. I've been benchmarking I/O for decades and have never observed this behavior. More correctly, I have seen odd one-off behaviors, but never figured out what to attribute it to.
John
[https://www.dropbox.com/scl/fi/b8h88muy9umyt4zey1d0d/sim_fpa.png?rlkey=atmyesl61p40s9vmbpmjkfv5b&st=icbay2kp&dl=0]
All 6 OSC cached versus time are in this next plot. Top frame w/o stat()s. Bottom frame with stat()s. It can be determined which OSCs are associated with the first file by its start time.
Notice in the top frame that the first file, even though it was not read or written to after RTC=106 seconds, it never gave up any of its cache, even though the second file was in dire need of the memory for caching. Files were not closed until all reading and writing was completed. The bottom frame, with the stat()s is much better behaved and much faster.
[https://www.dropbox.com/scl/fi/8xdif7vu3penwtx1bepjw/sim_osc_cached.png?rlkey=rqq9che3yjtzds8hjkg411nec&st=0c5pknhn&dl=0]
On 3/24/2026 5:08 PM, Oleg Drokin wrote:
On Tue, 2026-03-24 at 21:08 +0000, Andreas Dilger via lustre-discuss
wrote:
read locks (which would be counter productive), and whether it would
be possible to downgrade the DLM write locks to read locks while
preserving the cached data on the client(s).
I think we long wanted this functionality but never implemented it.
The other thing in those graphs (a bit annoying to see because the
scale differs from graph to graph), but the start-workload seems to run
faster at least on the writer node, but I am not sure if that's part of
the question of just a normal variancy of the environment.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260324/b642e079/attachment-0001.htm>
More information about the lustre-discuss
mailing list