[lustre-discuss] Lnet not going up with InfiniHost III Lx HCA card

Ramiro Alba Queipo ramiro.alba at upc.edu
Mon Feb 10 01:15:38 PST 2025


Jesse,

Thank you very much for your answer. I tried your recommendation of
putting map_on_demand=0, but it does not work. Anyway, as it is only a
means to test lustre installation I can manage to use ksocklnd over
ethernet, which it does work.

Best regards

On Fri, 7 Feb 2025 at 19:24, Jesse Stroik <jesse.stroik at ssec.wisc.edu>
wrote:

> Hi Ramiro,
>
> The invalid MR size looks like you're running into a limit with your cards
> setting up the RDMA (o2ib) LND when bringing up the network. There may be
> adjustments or workarounds for it possibly including setting
> map_on_demand=0 as an argument to the lnet module there.
>
> And since you are using older IB hardware on a newer OS, just a heads up:
> we recently ran into an issue with connectx-3 IB cards after upgrading our
> operating systems where we found RMDA communication to be unreliable
> possibly because they often would exceed the amount of connection queue
> pairs they could create. For us, the workaround was to use the ksocklnd
> instead of o2iblnd. If you have trouble getting the o2ib lustre network
> driver to work with this older hardware due to RDMA problems, that could be
> a workaround although it may not be feasible to implement depending on your
> networking setup.
>
> Best,
> Jesse
>
>
> ________________________________________
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf
> of Ramiro Alba Queipo <ramiro.alba at upc.edu>
> Sent: Thursday, February 6, 2025 3:34 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [lustre-discuss] Lnet not going up with InfiniHost III Lx HCA card
>
>
> Hi all,
>
> I am testing Ubuntu 24.04 (6.8.0-52-generic) client with Lustre 2.16.1
> over Infiniband and using an old Mellanox DDR card (InfiniHost III Lx HCA).
>
> - # ip -br a
>
>       options lnet networks=o2ib0(ib0)
>
> - # modprobe lnet
> - # lctl network up
>
>      LNET configure error 100: Network is down
>
> - # tail -10 /var/log/kernel.log
>
>      LNetError: 5071:0:(o2iblnd.c:2866:kiblnd_hdev_get_attr()) Invalid mr
> size: 0xffffffffffffffff
>      LNetError: 5071:0:(o2iblnd.c:3103:kiblnd_dev_failover()) Can't get
> device attributes: -22
>      LNetError: 5071:0:(o2iblnd.c:3831:kiblnd_startup()) ko2iblnd: Can't
> initialize device: rc = -22
>      LNetError: Error -100 starting up LNI o2ib
>
> Lustre 2.15.0 and Ubuntu 20.04 (kernel 5.4.0-198-generic) is working fine
> with the same hardware
>
> Can anyone give me some advice or idea to make it work?
>
> Thans in advance
> Best regards
>
> --
> Ramiro Alba
>
> Centre Tecnològic de Tranferència de Calor
> http://www.cttc.upc.edu<
> https://urldefense.com/v3/__http://www.cttc.upc.edu__;!!Mak6IKo!On9vgnDU5CEln4C9zazniBI1hEgioSxBPqr7Fd5blSIUQcojlPmtCAmRsP3OMqt4ZdEii93FRWH2FtVn8993JZ4Ixw$
> >
>
> Escola Tècnica Superior d'Enginyeries
> Industrial i Aeronàutica de Terrassa
> Colom 11, E-08222, Terrassa, Barcelona, Spain
> Tel: (+34) 93 739 8928
>


-- 
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu

Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 8928
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250210/26fe898b/attachment.htm>


More information about the lustre-discuss mailing list