[lustre-discuss] MGS mount succeeds then unmounts due to MGC timeout

Santiago Freire - InCo sfreire at fing.edu.uy
Mon Jan 19 12:13:07 PST 2026


Hello everyone,

I’m troubleshooting an issue on Rocky Linux 8.10 where an MGS mount 
appears to succeed (exit code 0) but the server unmounts a few seconds 
later due to MGC request timeouts. Because of this, MDT/OST targets 
cannot register with the MGS afterwards and I can't get a working 
filesystem.

At first I thought this was related to switching from ldiskfs to ZFS 
(OpenZFS DKMS), because the problem started after installing ZFS DKMS 
(from the Lustre repo) and rebuilding modules. However, I reproduced the 
same behavior even when using the kmod-based Lustre packages and also 
when trying ldiskfs again, so I'm a bit lost on what could have caused 
the issue.

I'm running Lustre 2.15.7 on Rocky 8.10, and this behaviour happens on 
the (only) MGS/MDS node. To reproduce the issue, I only need to format 
the MGT and then mount it normally. The mount command returns success, 
but shortly after that the server unmounts automatically.

Example output in dmesg:

Lustre: 236012:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request 
sent has timed out for slow reply: [sent 1768852634/real 1768852634]  
req at 000000001ae72b11 x1854776314691712/t0(0) 
o251->MGC10.0.0.4 at tcp@0 at lo:26/25 lens 224/224 e 0 to 1 dl 1768852640 ref 
2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'

Lustre: server umount MGS complete

A couple things I've tried:

  * lnetctl ping <mgs_ip>@tcp works
  * lctl list_nids shows the correct NID
  * lnetctl net show shows the TCP NI on the correct interface and the
    loopback
  * lnetctl ping <mgs_ip>@tcp from another host works
  * Port 988 is listening and open
  * Disabling firewalld does not change anything
  * SELinux is disabled
  * Removed all Lustre, zfs, kmod and dkms packages, rebuilt initramfs,
    changed to stock kernel and back to custom Lustre kernel,
    reinstalled all packages, etc.

However nothing worked and I can't explain the issue nor why I can't 
even mount through regular ldiskfs anymore.

Does anyone know the cause behind this issue and what could I do to fix 
it? My last resort would be reinstalling the OS and starting from 
scratch but I would very much prefer not to do that. This is a testing 
environment so I don't mind having to reformat, recreate or reinstall 
anything.

Thank you very much in advance.

Santiago

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20260119/030a7af2/attachment.htm>


More information about the lustre-discuss mailing list