[lustre-discuss] Group Lock Semantics
Freddie Witherden
freddie at witherden.org
Fri Feb 6 18:09:18 PST 2026
Hi,
We have an application where each rank needs to write data into
non-overlapping regions of a pre-existing file. As the writes are for
checkpoints there is no need to read any of the data back before the
file is closed.
A simple open(...), pwrite(...), pwrite(...), close(...) chain works
fine here. However, as one might expect the locking situation isn't ideal.
We are therefore looking into using group locks where after each rank
opens the file it issues the relevant ioctl(..., LL_IOC_GROUP_LOCK,
...). If we issue our same set of pwrites under a group lock we find
that data is occasionally missing. Given our writes can straddle pages
this isn't surprising as my understanding is that the page cache only
tracks if a page is dirty or not.
So, we reworked our code slightly to ensure that each page is only ever
written to by a single rank. However, even here we find data to
occasionally be missing from the file with the offsets corresponding to
boundaries between hosts. We have even tried increasing the size up to
the stripe size for the file (so each N MiB stripe is only ever written
to by a single rank) but to no avail.
Hence, I am wondering what the specific semantics are for writes under a
group lock? Do we have to use O_DIRECT and bypass the page cache, are
there more significant alignment requirements than pages?
Regards, Freddie.
More information about the lustre-discuss
mailing list