[lustre-discuss] NFS kernel server does not seem to trigger refreshes

Peter Grandi pg at lustre.list.sabi.co.UK
Thu Apr 23 15:06:09 PDT 2026


So the context is EL8 4.18 kernel and in-kernel NFS server,
Lustre client 2.16.1, and for comparison NFS Ganesha 5.7;
using NFS protocol version 4.2 with both.

The issue is the vexed one of inter-node consistency in a
special case:nnn

* Client "W" writing to a Lustre filesystem a log-file (every 3
  seconds).

* Lustre client "N" with an NFS server re-exporting the Lustre
  filesystem.

* NFS client "R" reading from NFS the log-file.

What I observe:

* A 'tail -f' of the log-file on "S" itself has little to no
  perceivable lag thanks to the Lustre DLM.

* A 'tail -f' of the log file on "R" has a few seconds of lag if
  I use the NFS Ganesha server on "S" (the lag depends on some
  NFS Ganesha caching parameters).

* A 'tail -f' of the log file on "R" can have a rather long if I
  use the NFS kernel server on "S" but it is erratic (seems to
  depend on how often I reopen the log).

* A 'tail -f' of the log file on "R" has only a small lag if I
  use the NFS kernel server on "S" and I write to the log-file
  on the "S" itself instead of on "W" (this seems to indicate
  that there is no issue with lag on the NFS side, because of
  presence or absence of delegations or various NFS side caching
  timeouts and I have done several tests).

Note: in the latter case neither the 'mtime' of the i-mode nor
the contents of the file get updated.

The difference seems to be that:

* User-program level access to the log-file on Lustre do trigger
  the DLM to refresh its cached state.

* Kernel-level access to the log-file on Lustre does not seem to
  trigger the DLM to refresh its cached state.

I would be happy to just use the NFS Ganesha server but it has
another flaw; it just hangs every 1-3 days during periods of
concurrent access on the NFS client (which I suspect due to some
incompatibility between the Linux NFS kernel client and the NFS
Ganesha server).

One way to work around the issue would be to limit the
time-to-live of cached Lustre file contents and attributes and
various web searches indicated that some Lustre client versions
used to have some caching timeouts but looking at 'lctl' and
under '/sys/' and '/proc/' in 2.16.1 I cannot see anything
relevant but the 'inode_cache' and 'xattr_cache' toggles and
those seem a bit drastic.

But I can see that on an NFS-Lustre server where there is no
local access to Lustre files other than from the NFS server that
might be an option.

Any suggestions or workarounds?


More information about the lustre-discuss mailing list