[lustre-discuss] 2.15.4 o2iblnd on RoCEv2?

Jeff Johnson jeff.johnson at aeoncomputing.com
Tue Jan 9 19:45:16 PST 2024


Howdy intrepid Lustrefarians,

While starting down the debug rabbit hole I thought I'd raise my hand
and see if anyone has a few magic beans to spare.

I cannot get lnet (via lnetctl) to init a o2iblnd interface on a
RoCEv2 interface.

Running `lnetctl net add --net ib0 --if enp1s0np0` results in
 net:
          errno: -1
          descr: cannot parse net '<255:65535>'

Nothing in dmesg to indicate why. Search engines aren't coughing up
much here either.

Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4

I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and
ibdev2netdev report it correctly. ibv_rc_pingpong works fine between
nodes.

Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if
enp1s0np0 && lnetctl net show`
[root at r2u11n3 ~]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0 at lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.0.50.27 at tcp
          status: up
          interfaces:
              0: enp1s0np0

I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well
as sysfs references

[root at r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1
RoCE v2

Ideas? Suggestions? Incense?

Thanks,

--Jeff


More information about the lustre-discuss mailing list