[lustre-discuss] OSS Large File I/O using O_DIRECT

Mon Jan 12 10:02:38 PST 2026

On Jan 10, 2026, at 07:21, John Bauer <bauerj at iodoctors.com> wrote:
> 
> Andreas,
> Thank you for your reply.  I am not asking about the Hybrid I/O feature.  A bit more on that later.  I am asking about the oti_dio_pages ( Direct I/O ) path through the OSS.  With some experimentation last night I was able to confirm that an OSS uses Direct I/O when the incoming client request is 8MB or greater.  On the system I am running on ( clients version 2.14 and server version as reported by pminfo is lustre.sys.version=2 lustre.sys.build=0_ddn218 ) it appears that the reads and writes use the server cache when the user requests are less that 8MB and using non-rotational OSTs.  The OSS could have been configured this way.  I will check with admin next week.

Yes, that is the default configuration.

> Back to Hybrid I/O.  It concerns me that there is no mechanism is user space that will allow a client application to specify legacy client-side buffering on a per-file basis, independent of request size.  I envision plenty of cases where a client application is running dedicated  on a 1TB memory client node, which leaves 500 MB for Lustre caching,

Presumably you mean 500GB for cache?

> and the application is doing large request I/O to a file that is under 500MB.  The client application would now be subject to Hybrid's direct I/O even though the entire file could be buffer cached on the client.  Am I misunderstanding something here?

In our testing (which is described in a couple of LUG presentations
as well as a FAST paper on Hybrid IO[*]), buffering pages in the Linux
VM is actually *slower* than sending it directly over the network
when the IO size is above 4 MiB.  This is particularly true when
there are multiple threads accessing a single file concurrently.

The reason is that even for read operations, the VM needs to take
a *write* lock on the address space in order to insert and remove
pages from cache, which serializes all cache operations on a file.
At modern network rates (100Gbps+ = 12GiB/s+) this is adding and
tracking millions of pages to the cache per second.  As soon as
there is any memory pressure it also means finding and removing
millions of pages per second from the cache, which is a lot of
CPU overhead and serialization.

It's possible that the threshold between BIO and DIO will change
in newer kernels as large folios are introduced (which will reduce
page cache overhead by a couple orders of magnitude (e.g. 256x for
1MiB stripe size, 1024x for 4MiB stripe size), but that has the
risk of increasing false contention between clients if they are
not doing stripe-aligned IO...

[*]
https://wiki.lustre.org/images/7/7d/LUG2023-Unaligned_DIO_v2-Farrell.pdf
https://wiki.lustre.org/images/a/a0/LUG2024-Hybrid_IO_Path_Update-Farrell.pdf
https://www.usenix.org/conference/fast24/presentation/qian
https://www.usenix.org/system/files/fast24-full_proceedings_interior.pdf

> Thanks again,
> John
> On 1/9/2026 7:02 PM, Andreas Dilger wrote:
>> Hi John,
>> are you asking about the Hybrid IO feature in 2.16+ Lustre releases,
>> or something else?
>> 
>> There is a client-side "llite.*.hybrid_io" parameter that can enable/disable Hybrid IO on a client completely, and the
>> llite.*.hybrid_io_write_threshold_bytes and
>> llite.*.hybrid_io_read_threshold_bytes can be used to tune the
>> IO size threshold where IO changes between buffered and direct.
>> 
>> Applications can of course use open(O_DIRECT) to use DIO instead of
>> buffered IO.
>> 
>> As for the server-side non-buffered IO path, this is controlled by
>> the osd-ldiskfs.*.readcache_max_io_mb and .writethrough_max_io_mb
>> parameters, both default to 8 MiB. Note that flash (non-rotational)
>> OSTs disable read and write cache entirely by default, since NVMe
>> devices are typically fast enough to handle incoming IO.
>> 
>> I don't think there is any way for clients to fetch these parameters
>> directly, since they are more a property of the OST than the client.
>> 
>> Cheers, Andreas
>> 
>> On Jan 9, 2026, at 14:32, John Bauer <bauerj at iodoctors.com <mailto:bauerj at iodoctors.com>> wrote:
>> 
>>> Hello all,
>>> Is the a way to determine what size a client I/O request must be to trigger the OSS to use the Large File ( non-buffered ) I/O path? Is this configurable?
>>> Is there a way for a client-side application to trigger this behavior independent of the I/O size?
>>> 
>> ---
>> Andreas Dilger
>> Principal Lustre Architect
>> adilger at thelustrecollective.com
>> 
>> 
>> 
>> 

---
Andreas Dilger
Principal Lustre Architect
adilger at thelustrecollective.com