[lustre-discuss] Lnet configuration and debugging
Steve Brasier
steveb at stackhpc.com
Tue Sep 17 02:50:59 PDT 2024
Hi.
I've got an /etc/lnet.conf on a RockyLinux 9.4 client running
lustre 2.15.5-1.el9 which has this lnet.conf:
[root at stg-login-0 rocky]# cat /etc/lnet.conf
net:
- net: tcp1
interfaces:
0: eth0
Running systemctl start lnet just hangs forever, with the syslog just
showing
Sep 13 15:31:35 stg-login-0 systemd[1]: Starting lnet management...
and its actually the below which hangs:
[root at stg-login-0 rocky]# /usr/sbin/lnetctl import /etc/lnet.conf
i.e. module load and lnet configure work OK.
However it looks like it autoconfigured an interface on tcp (not tcp1):
[root at stg-login-0 rocky]# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0 at lo
status: up
- net type: tcp
local NI(s):
- nid: 10.179.2.45 at tcp
status: up
So:
1. How can I debug this hanging please?
2. Do the client and server NIDs need to be in the same IPv4 subnet? I have
a client NID of 10.179.2.45 at tcp1 and a server NID of 10.167.128.1 at tcp1,
with IP routing between them such that icmp ping works between them, is
that OK?
many thanks for any help!
http://stackhpc.com/
Please note I work Tuesday to Friday.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240917/c42426f2/attachment.htm>
More information about the lustre-discuss
mailing list