[lustre-discuss] LNET Multi-rail

Hans Henrik Happe happe at nbi.dk
Tue Apr 10 07:15:11 PDT 2018

Thanks for the info. A few observations I found so far:

- I think LU-10297 has solved my stability issues.
- lustre.conf does work with comma separation of interfaces. I.e. 
o2ib(ib0,ib1). However, peers need to be configured with ldev.conf or 
- Defining peering ('lnetctl peer add' and ARP settings) on the client 
only, seems to make  multi-rail work both ways.

I'm a bit puzzled by the last observation. I expected that both ends 
needed to define peers? The client NID does not show as multi-rail 
(lnetctl peer show) on the server.

Hans Henrik

On 14-03-2018 03:00, Riccardo Veraldi wrote:
> it works for me but you have to set up correctly lnet.conf either
> manually or using  lnetctl to add peers. Then you export your
> configuration in lnet.conf
> and it will be loaded at reboot. I had to add my peers manually, I think
> peer auto discovery is not yet operational on 2.10.3.
> I suppose you are not using anymore lustre.conf to configure interfaces
> (ib,tcp) and that you are using the new Lustre DLC style:
> http://wiki.lustre.org/Dynamic_LNET_Configuration
> Also I do not know if you did this yet but you should configure ARP
> settings and also rt_tables for your ib interfaces if you use multi-rail.
> Here is an example. I had to do that to have things working properly:
> https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup
> You may also want to check that your IB interfaces (if you have a dual
> port infiniband like I have) can really double the performance when you
> enable both of them.
> The infiniband PCIe card bandwidth has to be capable of feeding enough
> traffic to both dual ports or it will just be useful as a fail over device,
> without improving the speed as you may want to.
> In my configuration fail over is working. If I disconnect one port, the
> other will still work. Of course if you disconnect it when traffic is
> going through
> you may have a problem with that stream of data. But new traffic will be
> handled correctly. I do not know if there is a way to avoid this, I am
> just talking about my experience and as I said I Am more interested in
> performance than fail over.
> Riccardo
> On 3/13/18 8:05 AM, Hans Henrik Happe wrote:
>> Hi,
>> I'm testing LNET multi-rail with 2.10.3 and I ran into some questions
>> that I couldn't find in the documentation or elsewhere.
>> As I understand the design document "Dynamic peer discovery" will make
>> it possible to discover multi-rail peer without adding them manually?
>> Is that functionality in 2.10.3?
>> Will failover work without doing anything special? I've tested with
>> two IB ports and unplugging resulted in no I/O from client and
>> replugging didn't resolve it.
>> How do I make and active/passive setup? One example I would really
>> like to see in the documentation, is the obvious o2ib-tcp combination,
>> where tcp is used if o2ib is down and fails back if it comes op again.
>> Anyone using MR in production? Done at bit of testing with dual ib on
>> both server and client and had a few crashes.
>> Cheers,
>> Hans Henrik
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

More information about the lustre-discuss mailing list