[lustre-discuss] 2.15.4 o2iblnd on RoCEv2?

Andreas Dilger adilger at whamcloud.com
Wed Jan 10 12:06:03 PST 2024


It would seem that the error message could be improved in this case?  Could you file an LU ticket for that with the reproducer below, and ideally along with a patch?

Cheers, Andreas

> On Jan 10, 2024, at 11:37, Jeff Johnson <jeff.johnson at aeoncomputing.com> wrote:
> 
> Man am I an idiot. Been up all night too many nights in a row and not
> enough coffee. It helps if you use the correct --net designation. I
> was typing ib0 instead of o2ib0. Declaring as o2ib0 works fine.
> 
> (cleanup from previous)
> lctl net down && lustre_rmmod
> 
> (new attempt)
> modprobe lnet -v
> lnetctl lnet configure
> lnetctl net add --if enp1s0np0 --net o2ib0
> lnetctl net show
> net:
>    - net type: lo
>      local NI(s):
>        - nid: 0 at lo
>          status: up
>    - net type: o2ib
>      local NI(s):
>        - nid: 10.0.50.27 at o2ib
>          status: up
>          interfaces:
>              0: enp1s0np0
> 
> Lots more to test and verify but the original mailing list submission
> was total pilot error on my part. Apologies to all who spent cycles
> pondering this nothingburger.
> 
> 
> 
> 
>> On Tue, Jan 9, 2024 at 7:45 PM Jeff Johnson
>> <jeff.johnson at aeoncomputing.com> wrote:
>> 
>> Howdy intrepid Lustrefarians,
>> 
>> While starting down the debug rabbit hole I thought I'd raise my hand
>> and see if anyone has a few magic beans to spare.
>> 
>> I cannot get lnet (via lnetctl) to init a o2iblnd interface on a
>> RoCEv2 interface.
>> 
>> Running `lnetctl net add --net ib0 --if enp1s0np0` results in
>> net:
>>          errno: -1
>>          descr: cannot parse net '<255:65535>'
>> 
>> Nothing in dmesg to indicate why. Search engines aren't coughing up
>> much here either.
>> 
>> Env: Rocky 8.9 x86_64, MOFED 5.8-4.1.5.0, Lustre 2.15.4
>> 
>> I'm able to run mpi over the RoCEv2 interface. Utils like ibstatus and
>> ibdev2netdev report it correctly. ibv_rc_pingpong works fine between
>> nodes.
>> 
>> Configuring as socklnd works fine. `lnetctl net add --net tcp0 --if
>> enp1s0np0 && lnetctl net show`
>> [root at r2u11n3 ~]# lnetctl net show
>> net:
>>    - net type: lo
>>      local NI(s):
>>        - nid: 0 at lo
>>          status: up
>>    - net type: tcp
>>      local NI(s):
>>        - nid: 10.0.50.27 at tcp
>>          status: up
>>          interfaces:
>>              0: enp1s0np0
>> 
>> I verified the RoCEv2 interface using nVidia's `cma_roce_mode` as well
>> as sysfs references
>> 
>> [root at r2u11n3 ~]# cma_roce_mode -d mlx5_0 -p 1
>> RoCE v2
>> 
>> Ideas? Suggestions? Incense?
>> 
>> Thanks,
>> 
>> --Jeff
> 
> 
> 
> --
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
> 
> jeff.johnson at aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
> 
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
> 
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list