[lustre-discuss] Does Lustre support RoCE?

Oucharek, Doug S doug.s.oucharek at intel.com
Fri May 12 17:52:11 PDT 2017


I’ve been able to determine what is causing the dump_cqe failures, but not why it is happening now (all of a sudden).

In Lustre, we pass an IOV of fragments to be RDMA’ed over IB.  The fragments need to be page aligned except that the first fragment does not have to start on a page boundary and the last fragment does not have to end on a page boundary.

When we set up the DMA addresses for remote RDMA, we mask off the fragments so the addresses are all on a page boundary.  I guess the original authors believed that all DMA addresses needed to be page aligned for IB hardware.  The mlx5 code (MOFED 4 specific?) does not like that we are not using the actual start address and is rejecting it in the form of a dump_cqe error.

This code does not seem to be a problem with MOFED 3.x so has something changed?  Has a page alignment restriction been removed?  I really cannot just turn off this alignment operation as I have no idea what will break elsewhere in the world of OFED/MOFED.

Could use some insight from people who understand IB hardware/firmware.

Doug

On May 11, 2017, at 11:26 AM, Indivar Nair <indivar.nair at techterra.in<mailto:indivar.nair at techterra.in>> wrote:

Thanks for the advice.
I had a hunch that the development will take time.

Regards,


Indivar Nair

On Thu, May 11, 2017 at 11:28 PM, Oucharek, Doug S <doug.s.oucharek at intel.com<mailto:doug.s.oucharek at intel.com>> wrote:
As I write this, I am banging my head against this wall trying to figure it out.  It is related to the new memory region registration process used by mlx5 cards.  I could really use the help of any Mellanox/RDMA experts out there.  The API has virtually no documentation and without the source code for MOFED 4, I am really in unable to do much more than guess at what is going on.

So, expect this to take a long time to resolve and stick with MOFED 3.x.

Doug

On May 11, 2017, at 10:29 AM, Indivar Nair <indivar.nair at techterra.in<mailto:indivar.nair at techterra.in>> wrote:

Thanks a lot, Michael, Andreas, Simon, Doug,
I have already installed MLNX OFED 4:-(
I will now have to undo it and install the earlier version.

Roughly, by when would the support for MLNX OFED 4 be available?

Regards,


Indivar Nair

On Thu, May 11, 2017 at 9:35 PM, Oucharek, Doug S <doug.s.oucharek at intel.com<mailto:doug.s.oucharek at intel.com>> wrote:
The note regarding MOFED 4 not supported by Lustre: I’m working on it. MOFED 4 did not drop support of Lustre, but did make API/behaviour changes which Lustre has not fully adapted to yet.  The ball is in the Lustre community’s court on this one now.

Doug

On May 11, 2017, at 8:47 AM, Simon Guilbault <simon.guilbault at calculquebec.ca<mailto:simon.guilbault at calculquebec.ca>> wrote:

Hi, your lnet.conf look fine, I tested lnet with RoCE V2 a while back with a pair of server using Connectx4 with a single 25Gb interface and RDMA was working with Centos 7.3, stock RHEL OFED and Lustre 2.9. The only settings that I had to use in lustre's config was this one:

options lnet networks=o2ib(ens2)

The performance was about the same (1.9GB/s) without any tuning with the lnet self-test but the CPU utilisation was a lot lower with RDMA than TCP (3% vs 65% of a core).

From my notes I took back then Lustre needed to be recompiled with MLNX OFED 3.4 and MLNX OFED 4 dropped support of Lustre accordings to their release notes.

Ref 965588
https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_0-2_0_0_1.pdf
https://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_0-2_0_2_0.pdf


On Thu, May 11, 2017 at 11:34 AM, Indivar Nair <indivar.nair at techterra.in<mailto:indivar.nair at techterra.in>> wrote:
So I should add something like this in lnet.conf -

options lnet networks=o2ib0(p4p1)

Thats it, right?

Regards,


Indivar Nair

On Thu, May 11, 2017 at 8:39 PM, Dilger, Andreas <andreas.dilger at intel.com<mailto:andreas.dilger at intel.com>> wrote:
If you have RoCE cards and configure them with OFED, and configure Lustre to use o2iblnd then it should use RDMA for those interfaces. The fact that they are RoCE cards is hidden below OFED.

Cheers, Andreas

> On May 11, 2017, at 08:36, Indivar Nair <indivar.nair at techterra.in<mailto:indivar.nair at techterra.in>> wrote:
>
> Hi ...,
>
> I have read in different forums and blogs that Lustre supports RoCE.
> But I cant find any documentation on it.
>
> I have a Lustre setup with 6 OSS and 2 SMB/NFS Gateways.
> They are all interconnected using Mellanox SN2700 100G Switch and Mellanox Connect-X4 100G NICs.
> I have installed the Mellanox OFED Drivers, but I cant find a way to tell Lustre / LNET to use RoCE.
>
> How do I go about?
>
> Regards,
>
>
> Indivar Nair
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170513/52e5ac08/attachment-0001.htm>


More information about the lustre-discuss mailing list