[lustre-discuss] Does Lustre support RoCE?

Indivar Nair indivar.nair at techterra.in
Sat May 27 05:20:21 PDT 2017


Hi ...,

Continuing on the earlier mails -
I have installed MOFED 3.4-2.1.8.0-rhel7.3 and recompiled Lustre (Server
and Client) to use it.

I have configured bonding (mode: 2 (balanced-xor), xmit_hash_policy:
layer2+3).
MOFED tools like ib_write_bw show 97Gbps speed between any 2 nodes.

However there is no improvement in Lustre performance with MOFED+RoCE.

obdfilter-survey 'disk' tests shows 3.5 - 4 GB/s read and 1.2 - 1.4 GB/s
write per OSS.
With 6 OSS that would be around 21GB/s read and 7.2GB/s write.

With TCP, I was getting 6GB/s read and 6GB/s write using TWO 100Gbps
clients (i.e. 3GB/s per client).
It is the same with MOFED+RoCE.
Absolutely no improvement.

I am hoping to get at least 16-18GB/s READ speed using 2 clients (i.e.
8-9GB/s per client)

Are there any specific settings to tune Lustre+RoCE to use the full
bandwidth?
Does any one have any specific experience with 100Gbps Ethernet NICs and
Lustre?

Regards,


Indivar Nair


On Thu, May 11, 2017 at 11:56 PM, Indivar Nair <indivar.nair at techterra.in>
wrote:

> Thanks for the advice.
> I had a hunch that the development will take time.
>
> Regards,
>
>
> Indivar Nair
>
> On Thu, May 11, 2017 at 11:28 PM, Oucharek, Doug S <
> doug.s.oucharek at intel.com> wrote:
>
>> As I write this, I am banging my head against this wall trying to figure
>> it out.  It is related to the new memory region registration process used
>> by mlx5 cards.  I could really use the help of any Mellanox/RDMA experts
>> out there.  The API has virtually no documentation and without the source
>> code for MOFED 4, I am really in unable to do much more than guess at what
>> is going on.
>>
>> So, expect this to take a long time to resolve and stick with MOFED 3.x.
>>
>> Doug
>>
>> On May 11, 2017, at 10:29 AM, Indivar Nair <indivar.nair at techterra.in>
>> wrote:
>>
>> Thanks a lot, Michael, Andreas, Simon, Doug,
>> I have already installed MLNX OFED 4:-(
>> I will now have to undo it and install the earlier version.
>>
>> Roughly, by when would the support for MLNX OFED 4 be available?
>>
>> Regards,
>>
>>
>> Indivar Nair
>>
>> On Thu, May 11, 2017 at 9:35 PM, Oucharek, Doug S <
>> doug.s.oucharek at intel.com> wrote:
>>
>>> The note regarding MOFED 4 not supported by Lustre: I’m working on it.
>>> MOFED 4 did not drop support of Lustre, but did make API/behaviour changes
>>> which Lustre has not fully adapted to yet.  The ball is in the Lustre
>>> community’s court on this one now.
>>>
>>> Doug
>>>
>>> On May 11, 2017, at 8:47 AM, Simon Guilbault <
>>> simon.guilbault at calculquebec.ca> wrote:
>>>
>>> Hi, your lnet.conf look fine, I tested lnet with RoCE V2 a while back
>>> with a pair of server using Connectx4 with a single 25Gb interface and RDMA
>>> was working with Centos 7.3, stock RHEL OFED and Lustre 2.9. The only
>>> settings that I had to use in lustre's config was this one:
>>>
>>> options lnet networks=o2ib(ens2)
>>>
>>> The performance was about the same (1.9GB/s) without any tuning with the
>>> lnet self-test but the CPU utilisation was a lot lower with RDMA than TCP
>>> (3% vs 65% of a core).
>>>
>>> From my notes I took back then Lustre needed to be recompiled with MLNX
>>> OFED 3.4 and MLNX OFED 4 dropped support of Lustre accordings to their
>>> release notes.
>>>
>>> Ref 965588
>>> https://www.mellanox.com/related-docs/prod_software/Mellanox
>>> _OFED_Linux_Release_Notes_4_0-2_0_0_1.pdf
>>> https://www.mellanox.com/related-docs/prod_software/Mellanox
>>> _OFED_Linux_Release_Notes_4_0-2_0_2_0.pdf
>>>
>>>
>>> On Thu, May 11, 2017 at 11:34 AM, Indivar Nair <indivar.nair at techterra.i
>>> n> wrote:
>>>
>>>> So I should add something like this in lnet.conf -
>>>>
>>>> options lnet networks=o2ib0(p4p1)
>>>>
>>>> Thats it, right?
>>>>
>>>> Regards,
>>>>
>>>>
>>>> Indivar Nair
>>>>
>>>> On Thu, May 11, 2017 at 8:39 PM, Dilger, Andreas <andreas.dilger at intel.
>>>> com> wrote:
>>>>
>>>>> If you have RoCE cards and configure them with OFED, and configure
>>>>> Lustre to use o2iblnd then it should use RDMA for those interfaces. The
>>>>> fact that they are RoCE cards is hidden below OFED.
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>> > On May 11, 2017, at 08:36, Indivar Nair <indivar.nair at techterra.in>
>>>>> wrote:
>>>>> >
>>>>> > Hi ...,
>>>>> >
>>>>> > I have read in different forums and blogs that Lustre supports RoCE.
>>>>> > But I cant find any documentation on it.
>>>>> >
>>>>> > I have a Lustre setup with 6 OSS and 2 SMB/NFS Gateways.
>>>>> > They are all interconnected using Mellanox SN2700 100G Switch and
>>>>> Mellanox Connect-X4 100G NICs.
>>>>> > I have installed the Mellanox OFED Drivers, but I cant find a way to
>>>>> tell Lustre / LNET to use RoCE.
>>>>> >
>>>>> > How do I go about?
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> >
>>>>> > Indivar Nair
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > lustre-discuss mailing list
>>>>> > lustre-discuss at lists.lustre.org
>>>>> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> lustre-discuss mailing list
>>>> lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>
>>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170527/d004e68e/attachment.htm>


More information about the lustre-discuss mailing list