[lustre-discuss] Lnet configuration and debugging

Steve Brasier steveb at stackhpc.com
Tue Sep 17 02:50:59 PDT 2024


Hi.

I've got an /etc/lnet.conf on a RockyLinux 9.4 client running
lustre 2.15.5-1.el9 which has this lnet.conf:

[root at stg-login-0 rocky]# cat /etc/lnet.conf
net:
    - net: tcp1
        interfaces:
            0: eth0

Running systemctl start lnet just hangs forever, with the syslog just
showing
Sep 13 15:31:35 stg-login-0 systemd[1]: Starting lnet management...

and its actually the below which hangs:
[root at stg-login-0 rocky]# /usr/sbin/lnetctl import /etc/lnet.conf
i.e. module load and lnet configure work OK.

However it looks like it autoconfigured an interface on tcp (not tcp1):
[root at stg-login-0 rocky]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.179.2.45 at tcp
          status: up

So:
1. How can I debug this hanging please?

2. Do the client and server NIDs need to be in the same IPv4 subnet? I have
a client NID of 10.179.2.45 at tcp1 and a server NID of 10.167.128.1 at tcp1,
with IP routing between them such that icmp ping works between them, is
that OK?

many thanks for any help!


http://stackhpc.com/
Please note I work Tuesday to Friday.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240917/c42426f2/attachment.htm>


More information about the lustre-discuss mailing list