[lustre-discuss] O_DIRECT writes to 2nd PFL component dumps 1st PFL component from cache

John Bauer bauerj at iodoctors.com
Thu Jan 15 12:34:54 PST 2026


All,

I am back to trying to emulate Hybrid I/O  from user space, doing direct 
and buffered I/O to the same file concurrently.  I open a file twice, 
once with O_DIRECT, and once without.  Note that you will see 2 
different file names involved, buffered.dat and direct.dat.  direct.dat 
is a symlink to buffered.dat and this is done so my tool can more easily 
display the direct and non-direct I/O differently.  The file has 
striping of 512M at 4{100,101,102,103}x32M<ssd-pool + 
EOF at 4{104,105,106,107}x32M<ssd-pool.  The application first writes 512M 
( 32M per write ) to only the first PFL component using non-direct fd.  
Then the application writes 512M ( 32M per write ) alternating between 
the direct fd and non-direct fd.  The very first write ( using direct ) 
into the 2nd component triggers the dump of the entire first component 
from buffer cache.  From that point on the 2 OSC that handle the 
non-direct writes accumulate cache.  The 2 OSC that handle the direct 
writes accumulate no cache.  My question: Why does Lustre dump the 1st 
component from buffer cache?  The 1st and 2nd component do not even 
share OSCs.  Lustre is has no problem dealing with direct and non-direct 
I/O in the same component (2nd component in this case).  To me it would 
seem that if Lustre can correctly buffer direct and non-direct in the 
same component, it should be able to correctly buffer direct and 
non-direct in multiple components.  My ultimate goal is to have the 
first, and smaller component, remain cached, and the remainder of the 
file use direct I/O, but as soon as I do a direct I/O, I lose all my 
buffer cache.

The top frame of the plot is the amount of cache used by each OSC versus 
time. The bottom frame of the plot is the File Position Activity versus 
time.  Next to each pwrite64() depicted, I indicate which OSC is being 
written to.  I have also colored the pwrite64()s by whether they used 
the direct fd (green) or non-direct fd(red).  As soon as the 2nd PFL 
component is touched by a direct write, that write waits until the OSCs 
of the first PFL component dump all their cache.

John

Image 1 :

https://www.dropbox.com/scl/fi/d7seezfj0gtxo1y7lzpvy/split_direct.png?rlkey=0sfo1erxo5ua1aef5ijfc81jx&st=pxb0qnts&dl=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260115/1e1f0f1e/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: split_direct.png
Type: image/png
Size: 81577 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260115/1e1f0f1e/attachment-0001.png>


More information about the lustre-discuss mailing list