[lustre-discuss] Lustre caching behavior

Tue Mar 24 16:12:17 PDT 2026

Oleg,

I recreated the plots with a shared time axis.  In your comments you 
mention the "writer node".  Note that all the reading and writing was 
done on a single compute node.

Andreas,

Concerning your comments, this is really troubling when considering 
benchmarking.  I first got a hint of this while benchmarking on a system 
that uses quotas.  I suspect the quota daemon periodically walks the 
file system doing stat()s.  One of my runs had significantly better 
performance and I now think it was because my directory was hit by a 
stat().  The performance improvement I saw was a one-off.  Could quotas 
cause this? *Benchmarkers beware*, doing an*ls -l* in your benchmarking 
directory could significantly alter performance.  I've been benchmarking 
I/O for decades and have never observed this behavior.  More correctly,  
I have seen odd one-off behaviors, but never figured out what to 
attribute it to.

John

https://www.dropbox.com/scl/fi/b8h88muy9umyt4zey1d0d/sim_fpa.png?rlkey=atmyesl61p40s9vmbpmjkfv5b&st=icbay2kp&dl=0

All 6 OSC*cached versus time* are in this next plot.  Top frame w/o 
stat()s. Bottom frame with stat()s.  It can be determined which OSCs are 
associated with the  first file by its start time.

Notice in the top frame that the first file, even though it was not read 
or written to after RTC=106 seconds, it never gave up any of its cache, 
even though the second file was in dire need of the memory for caching.  
Files were not closed until all reading and writing was completed.  The 
bottom frame, with the stat()s is much better behaved and much faster.

https://www.dropbox.com/scl/fi/8xdif7vu3penwtx1bepjw/sim_osc_cached.png?rlkey=rqq9che3yjtzds8hjkg411nec&st=0c5pknhn&dl=0

On 3/24/2026 5:08 PM, Oleg Drokin wrote:
> On Tue, 2026-03-24 at 21:08 +0000, Andreas Dilger via lustre-discuss
> wrote:
>> read locks (which would be counter productive), and whether it would
>> be possible to downgrade the DLM write locks to read locks while
>> preserving the cached data on the client(s).
> I think we long wanted this functionality but never implemented it.
>
> The other thing in those graphs (a bit annoying to see because the
> scale differs from graph to graph), but the start-workload seems to run
> faster at least on the writer node, but I am not sure if that's part of
> the question of just a normal variancy of the environment.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260324/f4f49e72/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sim_fpa.png
Type: image/png
Size: 14697 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260324/f4f49e72/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sim_osc_cached.png
Type: image/png
Size: 16629 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260324/f4f49e72/attachment-0003.png>