[lustre-discuss] Group Lock Semantics

Patrick Farrell pfarrell at ddn.com
Sat Feb 7 07:43:38 PST 2026


A suggestion:

Try just direct IO.  Group locks are complicated and have challenging semantics, including the unavoidable possibility of stale file size and obviously requiring you to be very careful in your application.  In recent versions of Lustre, using direct IO avoids taking locks on the client entirely - instead, it takes locks purely on the server side and only for the written extent and length of the IO itself.  This avoids all of the problems with shared file lock contention for non-overlapping writes, and still gives 100% expected semantics for overlapping reads and writes.

So, if you are able to switch to direct IO as you mention, group locks should be unnecessary and are better avoided.  Direct IO works like this in 2.15 and newer.  (Also, in 2.17, hybrid IO can do this switch for you automatically for larger IO sizes.)

Patrick
________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Oleg Drokin via lustre-discuss <lustre-discuss at lists.lustre.org>
Sent: Saturday, February 7, 2026 12:49 AM
To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>; freddie at witherden.org <freddie at witherden.org>
Subject: Re: [lustre-discuss] Group Lock Semantics

Hello!

On Fri, 2026-02-06 at 18:09 -0800, Freddie Witherden via lustre-discuss
wrote:

> So, we reworked our code slightly to ensure that each page is only
> ever
> written to by a single rank.   However, even here we find data to
> occasionally be missing from the file with the offsets corresponding
> to
> boundaries between hosts.  We have even tried increasing the size up
> to
> the stripe size for the file (so each N MiB stripe is only ever
> written
> to by a single rank) but to no avail.
>
> Hence, I am wondering what the specific semantics are for writes
> under a
> group lock?  Do we have to use O_DIRECT and bypass the page cache,
> are
> there more significant alignment requirements than pages?

I think O_DIRECT was the primary idea for using as otherwise same host
mixed io might be confused about which pages are covered by what locks,
but in general it's still supposed to work without any particular
alignment requirements.

Do you happen to have a simplistic test case demonstrating the problem
by any chance?

Bye,
   Oleg
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260207/7139a8ca/attachment.htm>


More information about the lustre-discuss mailing list