[lustre-discuss] LNET Multi-rail

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Mon Apr 16 06:46:40 PDT 2018


On 4/10/18 7:15 AM, Hans Henrik Happe wrote:
> Thanks for the info. A few observations I found so far:
>
> - I think LU-10297 has solved my stability issues.
> - lustre.conf does work with comma separation of interfaces. I.e.
> o2ib(ib0,ib1). However, peers need to be configured with ldev.conf or
> lnetctl.
myself I just use lnet.conf (lnetctl) I do not use anymore lustre.conf,
I define all my interfaces and peers and they are loaded from lnet.conf

> - Defining peering ('lnetctl peer add' and ARP settings) on the client
> only, seems to make  multi-rail work both ways.
>
> I'm a bit puzzled by the last observation. I expected that both ends
> needed to define peers? The client NID does not show as multi-rail
> (lnetctl peer show) on the server.
>
> Cheers,
> Hans Henrik
>
> On 14-03-2018 03:00, Riccardo Veraldi wrote:
>> it works for me but you have to set up correctly lnet.conf either
>> manually or using  lnetctl to add peers. Then you export your
>> configuration in lnet.conf
>> and it will be loaded at reboot. I had to add my peers manually, I think
>> peer auto discovery is not yet operational on 2.10.3.
>> I suppose you are not using anymore lustre.conf to configure interfaces
>> (ib,tcp) and that you are using the new Lustre DLC style:
>>
>> http://wiki.lustre.org/Dynamic_LNET_Configuration
>>
>> Also I do not know if you did this yet but you should configure ARP
>> settings and also rt_tables for your ib interfaces if you use
>> multi-rail.
>> Here is an example. I had to do that to have things working properly:
>>
>> https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup
>>
>> You may also want to check that your IB interfaces (if you have a dual
>> port infiniband like I have) can really double the performance when you
>> enable both of them.
>> The infiniband PCIe card bandwidth has to be capable of feeding enough
>> traffic to both dual ports or it will just be useful as a fail over
>> device,
>> without improving the speed as you may want to.
>>
>> In my configuration fail over is working. If I disconnect one port, the
>> other will still work. Of course if you disconnect it when traffic is
>> going through
>> you may have a problem with that stream of data. But new traffic will be
>> handled correctly. I do not know if there is a way to avoid this, I am
>> just talking about my experience and as I said I Am more interested in
>> performance than fail over.
>>
>>
>> Riccardo
>>
>>
>> On 3/13/18 8:05 AM, Hans Henrik Happe wrote:
>>> Hi,
>>>
>>> I'm testing LNET multi-rail with 2.10.3 and I ran into some questions
>>> that I couldn't find in the documentation or elsewhere.
>>>
>>> As I understand the design document "Dynamic peer discovery" will make
>>> it possible to discover multi-rail peer without adding them manually?
>>> Is that functionality in 2.10.3?
>>>
>>> Will failover work without doing anything special? I've tested with
>>> two IB ports and unplugging resulted in no I/O from client and
>>> replugging didn't resolve it.
>>>
>>> How do I make and active/passive setup? One example I would really
>>> like to see in the documentation, is the obvious o2ib-tcp combination,
>>> where tcp is used if o2ib is down and fails back if it comes op again.
>>>
>>> Anyone using MR in production? Done at bit of testing with dual ib on
>>> both server and client and had a few crashes.
>>>
>>> Cheers,
>>> Hans Henrik
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




More information about the lustre-discuss mailing list