[lustre-discuss] Cannot do a ping with LNet over Infiniband
Vinícius Ferrão
ferrao at versatushpc.com.br
Tue Jan 19 17:21:03 PST 2021
Hi Chris, you nailed it.
There’s something wrong with the Infiniband network. The bad news is that I don’t have control nor access to it.
For the testing I initially done what you’ve said about -I on common ping:
[root at mds1 ~]# ping -I ib0 10.148.0.21
PING 10.148.0.21 (10.148.0.21) from 10.148.0.20 ib0: 56(84) bytes of data.
64 bytes from 10.148.0.21: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 10.148.0.21: icmp_seq=2 ttl=64 time=0.079 ms
64 bytes from 10.148.0.21: icmp_seq=3 ttl=64 time=0.078 ms
After I’ve tried ibping which seems to be working:
[root at mds1 ~]# ibping -G 0xb8599f0300e34b56
Pong from mds2.localdomain.(none) (Lid 63): time 0.159 ms
Pong from mds2.localdomain.(none) (Lid 63): time 0.177 ms
Pong from mds2.localdomain.(none) (Lid 63): time 0.178 ms
But ib_send_bw was a no go, I’ve fired up the server on mds2 but it didn’t work from the client side:
[root at mds2 ~]# ib_send_bw
************************************
* Waiting for client to connect... *
************************************
[root at mds1 ~]# ib_send_bw 10.148.0.21 -a
Couldn't connect to 10.148.0.21:18515
Unable to open file descriptor for socket connection Unable to init the socket connection
But after this initial testing I observed something at least strange. On mds2 it took a while to ping mds1:
[root at mds2 ~]# ping 10.148.0.20
PING 10.148.0.20 (10.148.0.20) 56(84) bytes of data.
64 bytes from 10.148.0.20: icmp_seq=7 ttl=64 time=1.48 ms
64 bytes from 10.148.0.20: icmp_seq=8 ttl=64 time=0.172 ms
64 bytes from 10.148.0.20: icmp_seq=9 ttl=64 time=0.169 ms
People note that the first icmp_seq was equal to 7.
After doing this, ib_send_bw just works:
[root at mds1 ~]# ib_send_bw 10.148.0.21
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x3e QPN 0x00c3 PSN 0xfa38c7
remote address: LID 0x3f QPN 0x008a PSN 0xd3f2c
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 1199.926000 != 3296.337000. CPU Frequency is not max.
65536 1000 11482.36 11473.00 0.183568
———————————————————————————————————————————
So yeah, the first that I must suspect is OpenSM. But I don’t have access to it. That just a guess. Have anyone seem something like this?
Thanks.
Vinícius.
> On 19 Jan 2021, at 13:43, Horn, Chris <chris.horn at hpe.com> wrote:
>
> You might try running an IB benchmark tool, e.g. ib_send_bw, between two hosts to verify that the network is working at that level.
>
> Note that the 'ping' command doesn't always send traffic in the manner you might expect. You should use the '-I interface' option to make sure that the pings are being sent over the desired local interface.
>
> Chris Horn
>
> On 1/18/21, 10:39 PM, "lustre-discuss on behalf of Vinícius Ferrão" <lustre-discuss-bounces at lists.lustre.org on behalf of ferrao at versatushpc.com.br> wrote:
>
> Hello,
>
> I’ve been scratching my head for three days now but I cannot do a simple ping over Infiniband using LNet. To be honest I have no idea of whats may be happening. LNet over TCP (on ethernet) seems to work fine. The only way LNet ping works is by pinging itself:
>
> [root at mds1 ~]# lctl ping 10.148.0.20 at o2ib1
> 12345-0 at lo
> 12345-10.24.2.12 at tcp1
> 12345-10.148.0.20 at o2ib1
>
> Everything else just fails:
>
> [root at mds1 ~]# lctl ping 10.148.0.21 at o2ib1
> failed to ping 10.148.0.21 at o2ib1: Input/output error
> [root at mds1 ~]# dmesg -T | tail -n 2
> [Tue Jan 19 01:26:01 2021] LNet: 2424:0:(o2iblnd_cb.c:3405:kiblnd_check_conns()) Timed out tx for 10.148.0.21 at o2ib1: 5095 seconds
> [Tue Jan 19 01:26:01 2021] LNetError: 2362:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.148.0.21 at o2ib1: -125
>
> I can confirm that IPoIB network is working as expected:
>
> [root at mds1 ~]# ping 10.148.0.21
> PING 10.148.0.21 (10.148.0.21) 56(84) bytes of data.
> 64 bytes from 10.148.0.21: icmp_seq=1 ttl=64 time=2.52 ms
> 64 bytes from 10.148.0.21: icmp_seq=2 ttl=64 time=0.085 ms
>
> Configuration seem to match between the two example machines:
>
> [root at mds1 ~]# ifconfig ib0 | head -n 2
> Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520
> inet 10.148.0.20 netmask 255.255.0.0 broadcast 10.148.255.255
>
> [root at mds2 ~]# ifconfig ib0 | head -n 2
> Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520
> inet 10.148.0.21 netmask 255.255.0.0 broadcast 10.148.255.255
>
> Here’s the output of network configuration:
> [root at mds1 ~]# lnetctl net show
> net:
> - net type: lo
> local NI(s):
> - nid: 0 at lo
> status: up
> - net type: tcp1
> local NI(s):
> - nid: 10.24.2.12 at tcp1
> status: up
> interfaces:
> 0: bond0
> - net type: o2ib1
> local NI(s):
> - nid: 10.148.0.20 at o2ib1
> status: up
> interfaces:
> 0: ib0
>
> Modules seems to be loaded:
> [root at mds1 ~]# lsmod | egrep "mlx|mlnx|lnet|rdma|ko2iblnd"
> lnet_selftest 274357 0
> ko2iblnd 238469 1
> lnet 595358 4 ko2iblnd,lnet_selftest,ksocklnd
> libcfs 415577 4 lnet,ko2iblnd,lnet_selftest,ksocklnd
> rdma_ucm 26931 0
> rdma_cm 64252 2 ko2iblnd,rdma_ucm
> iw_cm 43918 1 rdma_cm
> ib_cm 53015 3 rdma_cm,ib_ucm,ib_ipoib
> mlx4_en 142468 0
> mlx4_ib 220791 0
> mlx4_core 361489 2 mlx4_en,mlx4_ib
> mlx5_ib 398193 0
> ib_uverbs 134646 3 mlx5_ib,ib_ucm,rdma_ucm
> ib_core 379808 11 rdma_cm,ib_cm,iw_cm,ko2iblnd,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
> mlx5_core 1113637 1 mlx5_ib
> mlxfw 18227 1 mlx5_core
> devlink 60067 4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core
> mlx_compat 47141 15 rdma_cm,ib_cm,iw_cm,ko2iblnd,mlx4_en,mlx4_ib,mlx5_ib,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
> ptp 23551 3 i40e,mlx4_en,mlx5_core
>
> Both systems were running CentOS 7.9, Lustre 2.12.6 (IB Branch) and Mellanox OFED 4.9-2.2.4.0.
>
> The only error message that I’ve found is the one that I’ve pasted in the start of this message on dmesg and tem I/O error.
>
> Any help is greatly appreciated.
> Thanks,
> Vinícius.
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
More information about the lustre-discuss
mailing list