[lustre-discuss] Cannot do a ping with LNet over Infiniband

Vinícius Ferrão ferrao at versatushpc.com.br
Tue Jan 19 17:21:03 PST 2021


Hi Chris, you nailed it.

There’s something wrong with the Infiniband network. The bad news is that I don’t have control nor access to it.

For the testing I initially done what you’ve said about -I on common ping:

[root at mds1 ~]# ping -I ib0 10.148.0.21
PING 10.148.0.21 (10.148.0.21) from 10.148.0.20 ib0: 56(84) bytes of data.
64 bytes from 10.148.0.21: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 10.148.0.21: icmp_seq=2 ttl=64 time=0.079 ms
64 bytes from 10.148.0.21: icmp_seq=3 ttl=64 time=0.078 ms

After I’ve tried ibping which seems to be working:
[root at mds1 ~]# ibping -G 0xb8599f0300e34b56
Pong from mds2.localdomain.(none) (Lid 63): time 0.159 ms
Pong from mds2.localdomain.(none) (Lid 63): time 0.177 ms
Pong from mds2.localdomain.(none) (Lid 63): time 0.178 ms

But ib_send_bw was a no go, I’ve fired up the server on mds2 but it didn’t work from the client side:

[root at mds2 ~]# ib_send_bw

************************************
* Waiting for client to connect... *
************************************

[root at mds1 ~]# ib_send_bw 10.148.0.21 -a
Couldn't connect to 10.148.0.21:18515
Unable to open file descriptor for socket connection Unable to init the socket connection

But after this initial testing I observed something at least strange. On mds2 it took a while to ping mds1:

[root at mds2 ~]# ping 10.148.0.20
PING 10.148.0.20 (10.148.0.20) 56(84) bytes of data.
64 bytes from 10.148.0.20: icmp_seq=7 ttl=64 time=1.48 ms
64 bytes from 10.148.0.20: icmp_seq=8 ttl=64 time=0.172 ms
64 bytes from 10.148.0.20: icmp_seq=9 ttl=64 time=0.169 ms

People note that the first icmp_seq was equal to 7.

After doing this, ib_send_bw just works:
[root at mds1 ~]# ib_send_bw 10.148.0.21
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF		Device         : mlx5_0
 Number of qps   : 1		Transport type : IB
 Connection type : RC		Using SRQ      : OFF
 PCIe relax order: ON
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs	 : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x3e QPN 0x00c3 PSN 0xfa38c7
 remote address: LID 0x3f QPN 0x008a PSN 0xd3f2c
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 1199.926000 != 3296.337000. CPU Frequency is not max.
 65536      1000             11482.36            11473.00		   0.183568
———————————————————————————————————————————

So yeah, the first that I must suspect is OpenSM. But I don’t have access to it. That just a guess. Have anyone seem something like this?

Thanks.
Vinícius.

> On 19 Jan 2021, at 13:43, Horn, Chris <chris.horn at hpe.com> wrote:
> 
> You might try running an IB benchmark tool, e.g. ib_send_bw, between two hosts to verify that the network is working at that level.
> 
> Note that the 'ping' command doesn't always send traffic in the manner you might expect. You should use the '-I interface' option to make sure that the pings are being sent over the desired local interface.
> 
> Chris Horn
> 
> On 1/18/21, 10:39 PM, "lustre-discuss on behalf of Vinícius Ferrão" <lustre-discuss-bounces at lists.lustre.org on behalf of ferrao at versatushpc.com.br> wrote:
> 
>    Hello,
> 
>    I’ve been scratching my head for three days now but I cannot do a simple ping over Infiniband using LNet. To be honest I have no idea of whats may be happening. LNet over TCP (on ethernet) seems to work fine. The only way LNet ping works is by pinging itself:
> 
>    [root at mds1 ~]# lctl ping 10.148.0.20 at o2ib1
>    12345-0 at lo
>    12345-10.24.2.12 at tcp1
>    12345-10.148.0.20 at o2ib1
> 
>    Everything else just fails:
> 
>    [root at mds1 ~]# lctl ping 10.148.0.21 at o2ib1
>    failed to ping 10.148.0.21 at o2ib1: Input/output error
>    [root at mds1 ~]# dmesg -T | tail -n 2
>    [Tue Jan 19 01:26:01 2021] LNet: 2424:0:(o2iblnd_cb.c:3405:kiblnd_check_conns()) Timed out tx for 10.148.0.21 at o2ib1: 5095 seconds
>    [Tue Jan 19 01:26:01 2021] LNetError: 2362:0:(lib-move.c:2955:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.148.0.21 at o2ib1: -125
> 
>    I can confirm that IPoIB network is working as expected:
> 
>    [root at mds1 ~]# ping 10.148.0.21
>    PING 10.148.0.21 (10.148.0.21) 56(84) bytes of data.
>    64 bytes from 10.148.0.21: icmp_seq=1 ttl=64 time=2.52 ms
>    64 bytes from 10.148.0.21: icmp_seq=2 ttl=64 time=0.085 ms
> 
>    Configuration seem to match between the two example machines:
> 
>    [root at mds1 ~]# ifconfig ib0 | head -n 2
>    Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
>    ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>            inet 10.148.0.20  netmask 255.255.0.0  broadcast 10.148.255.255
> 
>    [root at mds2 ~]# ifconfig ib0 | head -n 2
>    Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
>    ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>            inet 10.148.0.21  netmask 255.255.0.0  broadcast 10.148.255.255
> 
>    Here’s the output of network configuration:
>    [root at mds1 ~]# lnetctl net show 
>    net:
>        - net type: lo
>          local NI(s):
>            - nid: 0 at lo
>              status: up
>        - net type: tcp1
>          local NI(s):
>            - nid: 10.24.2.12 at tcp1
>              status: up
>              interfaces:
>                  0: bond0
>        - net type: o2ib1
>          local NI(s):
>            - nid: 10.148.0.20 at o2ib1
>              status: up
>              interfaces:
>                  0: ib0
> 
>    Modules seems to be loaded:
>    [root at mds1 ~]# lsmod | egrep "mlx|mlnx|lnet|rdma|ko2iblnd"
>    lnet_selftest         274357  0 
>    ko2iblnd              238469  1 
>    lnet                  595358  4 ko2iblnd,lnet_selftest,ksocklnd
>    libcfs                415577  4 lnet,ko2iblnd,lnet_selftest,ksocklnd
>    rdma_ucm               26931  0 
>    rdma_cm                64252  2 ko2iblnd,rdma_ucm
>    iw_cm                  43918  1 rdma_cm
>    ib_cm                  53015  3 rdma_cm,ib_ucm,ib_ipoib
>    mlx4_en               142468  0 
>    mlx4_ib               220791  0 
>    mlx4_core             361489  2 mlx4_en,mlx4_ib
>    mlx5_ib               398193  0 
>    ib_uverbs             134646  3 mlx5_ib,ib_ucm,rdma_ucm
>    ib_core               379808  11 rdma_cm,ib_cm,iw_cm,ko2iblnd,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
>    mlx5_core            1113637  1 mlx5_ib
>    mlxfw                  18227  1 mlx5_core
>    devlink                60067  4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core
>    mlx_compat             47141  15 rdma_cm,ib_cm,iw_cm,ko2iblnd,mlx4_en,mlx4_ib,mlx5_ib,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
>    ptp                    23551  3 i40e,mlx4_en,mlx5_core
> 
>    Both systems were running CentOS 7.9, Lustre 2.12.6 (IB Branch) and Mellanox OFED 4.9-2.2.4.0.
> 
>    The only error message that I’ve found is the one that I’ve pasted in the start of this message on dmesg and tem I/O error.
> 
>    Any help is greatly appreciated.
>    Thanks,
>    Vinícius.
> 
> 
> 
>    _______________________________________________
>    lustre-discuss mailing list
>    lustre-discuss at lists.lustre.org
>    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org 
> 



More information about the lustre-discuss mailing list