[lustre-discuss] 2.15.4 hangs during mount using TCP

Hans Henrik Happe happe at nbi.dk
Fri Mar 22 08:23:56 PDT 2024


Hi,

After updating to lustre 2.15.4 I've had trouble mounting over TCP. 
Using Infiniband works fine, but over TCP it just hangs without errors 
on client or servers.

OS is Rocky 9.2 on client and CentOS 7.9 on servers running 2.12.9.

Rocky 9.2 + 2.15.3 works, but both Rocky 9.2 and 9.3 with 2.15.4 hangs.

Anyone having the same issue?

A few notes about our system:

- It's ZFS based.
- It was created back in 2015. MGS, and MDTs have survived since then 
(zfs send/receive), while new OSTs have been added over time an old ones 
have been taken out.
- There are 2 filesystems on an MDS pair. One MDT on each MDS. Both have 
the hanging problem.
- Dual network stack with Infiniband and TCP. For historical reasons we 
are using tcp1 and not the default tcp0. No routers.

I'll dive into getting more debugging info out. Any pointers on how to 
do this efficiently would be much appreciated.

Cheers,
Hans Henrik

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240322/98930f11/attachment.htm>


More information about the lustre-discuss mailing list