[lustre-discuss] Group Lock Semantics

Freddie Witherden freddie at witherden.org
Fri Feb 6 18:09:18 PST 2026


Hi,

We have an application where each rank needs to write data into 
non-overlapping regions of a pre-existing file.  As the writes are for 
checkpoints there is no need to read any of the data back before the 
file is closed.

A simple open(...), pwrite(...), pwrite(...), close(...) chain works 
fine here.  However, as one might expect the locking situation isn't ideal.

We are therefore looking into using group locks where after each rank 
opens the file it issues the relevant ioctl(..., LL_IOC_GROUP_LOCK, 
...).  If we issue our same set of pwrites under a group lock we find 
that data is occasionally missing.  Given our writes can straddle pages 
this isn't surprising as my understanding is that the page cache only 
tracks if a page is dirty or not.

So, we reworked our code slightly to ensure that each page is only ever 
written to by a single rank.   However, even here we find data to 
occasionally be missing from the file with the offsets corresponding to 
boundaries between hosts.  We have even tried increasing the size up to 
the stripe size for the file (so each N MiB stripe is only ever written 
to by a single rank) but to no avail.

Hence, I am wondering what the specific semantics are for writes under a 
group lock?  Do we have to use O_DIRECT and bypass the page cache, are 
there more significant alignment requirements than pages?

Regards, Freddie.


More information about the lustre-discuss mailing list