[lustre-discuss] Lnet configuration and debugging
Hans Henrik Happe
happe at nbi.dk
Tue Oct 1 23:23:55 PDT 2024
Hi,
We have a tcp1 config, but lnet.conf looks like this:
net:
- net type: tcp1
local NI(s):
- nid: <IP>@tcp1
status: up
interfaces:
0: eth0
replace <IP> with NID IP. I guess you need "- net type" instead of just
"- net".
Cheers,
Hans Henrik
On 17/09/2024 11.50, Steve Brasier wrote:
> Hi.
>
> I've got an /etc/lnet.conf on a RockyLinux 9.4 client running
> lustre 2.15.5-1.el9 which has this lnet.conf:
>
> [root at stg-login-0 rocky]# cat /etc/lnet.conf
> net:
> - net: tcp1
> interfaces:
> 0: eth0
>
> Running systemctl start lnet just hangs forever, with the syslog just
> showing
> Sep 13 15:31:35 stg-login-0 systemd[1]: Starting lnet management...
>
> and its actually the below which hangs:
> [root at stg-login-0 rocky]# /usr/sbin/lnetctl import /etc/lnet.conf
> i.e. module load and lnet configure work OK.
>
> However it looks like it autoconfigured an interface on tcp (not tcp1):
> [root at stg-login-0 rocky]# lnetctl net show
> net:
> - net type: lo
> local NI(s):
> - nid: 0 at lo
> status: up
> - net type: tcp
> local NI(s):
> - nid: 10.179.2.45 at tcp
> status: up
>
> So:
> 1. How can I debug this hanging please?
>
> 2. Do the client and server NIDs need to be in the same IPv4 subnet? I
> have a client NID of 10.179.2.45 at tcp1 and a server NID
> of 10.167.128.1 at tcp1, with IP routing between them such that icmp ping
> works between them, is that OK?
>
> many thanks for any help!
>
>
> http://stackhpc.com/
> Please note I work Tuesday to Friday.
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20241002/5aa5d526/attachment.htm>
More information about the lustre-discuss
mailing list