[lustre-discuss] 2.15.4 hangs during mount using TCP
Hans Henrik Happe
happe at nbi.dk
Fri Mar 22 08:23:56 PDT 2024
Hi,
After updating to lustre 2.15.4 I've had trouble mounting over TCP.
Using Infiniband works fine, but over TCP it just hangs without errors
on client or servers.
OS is Rocky 9.2 on client and CentOS 7.9 on servers running 2.12.9.
Rocky 9.2 + 2.15.3 works, but both Rocky 9.2 and 9.3 with 2.15.4 hangs.
Anyone having the same issue?
A few notes about our system:
- It's ZFS based.
- It was created back in 2015. MGS, and MDTs have survived since then
(zfs send/receive), while new OSTs have been added over time an old ones
have been taken out.
- There are 2 filesystems on an MDS pair. One MDT on each MDS. Both have
the hanging problem.
- Dual network stack with Infiniband and TCP. For historical reasons we
are using tcp1 and not the default tcp0. No routers.
I'll dive into getting more debugging info out. Any pointers on how to
do this efficiently would be much appreciated.
Cheers,
Hans Henrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240322/98930f11/attachment.htm>
More information about the lustre-discuss
mailing list