[lustre-discuss] LNET Multi-rail

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Tue Mar 13 19:00:28 PDT 2018


it works for me but you have to set up correctly lnet.conf either
manually or using  lnetctl to add peers. Then you export your
configuration in lnet.conf
and it will be loaded at reboot. I had to add my peers manually, I think
peer auto discovery is not yet operational on 2.10.3.
I suppose you are not using anymore lustre.conf to configure interfaces
(ib,tcp) and that you are using the new Lustre DLC style:

http://wiki.lustre.org/Dynamic_LNET_Configuration

Also I do not know if you did this yet but you should configure ARP
settings and also rt_tables for your ib interfaces if you use multi-rail.
Here is an example. I had to do that to have things working properly: 

https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup

You may also want to check that your IB interfaces (if you have a dual
port infiniband like I have) can really double the performance when you
enable both of them.
The infiniband PCIe card bandwidth has to be capable of feeding enough
traffic to both dual ports or it will just be useful as a fail over device,
without improving the speed as you may want to.

In my configuration fail over is working. If I disconnect one port, the
other will still work. Of course if you disconnect it when traffic is
going through
you may have a problem with that stream of data. But new traffic will be
handled correctly. I do not know if there is a way to avoid this, I am
just talking about my experience and as I said I Am more interested in
performance than fail over.


Riccardo


On 3/13/18 8:05 AM, Hans Henrik Happe wrote:
> Hi,
>
> I'm testing LNET multi-rail with 2.10.3 and I ran into some questions
> that I couldn't find in the documentation or elsewhere.
>
> As I understand the design document "Dynamic peer discovery" will make
> it possible to discover multi-rail peer without adding them manually?
> Is that functionality in 2.10.3?
>
> Will failover work without doing anything special? I've tested with
> two IB ports and unplugging resulted in no I/O from client and
> replugging didn't resolve it.
>
> How do I make and active/passive setup? One example I would really
> like to see in the documentation, is the obvious o2ib-tcp combination,
> where tcp is used if o2ib is down and fails back if it comes op again.
>
> Anyone using MR in production? Done at bit of testing with dual ib on
> both server and client and had a few crashes.
>
> Cheers,
> Hans Henrik
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





More information about the lustre-discuss mailing list