[lustre-discuss] problems with lnet peers on lustre 2.11.0

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Tue May 8 18:07:22 PDT 2018


Hello,
I have problems with my lnet configuration on lustre 2.11.0
everything starts just fine but after a while lnet auto discovers peers
and it adds the tcp network interface of my OSSes and clients
so that clients start to write on lustre partition using tcp and no more
o2ib.
I use and need tcp just to contact the MDS, and o2ib for contacting
OSSes. this configuration has always been working with Lustre 2.10.*

I tried to switch off auto peer dicovery but it did not work.
I Als otried not tto use lnet.conf at all and just to use
/etc/modprobe/lustre.conf with

opions lnet networks=o2ib(ib0),tcp(eth0)

but seems like lustre 2.11.0 does not like it anymore.

so I went back to lnet.conf but I can't make it stop to auto discover
tcp interfaces.

after a while the tcp interfaces starts to appear while I did not
configure it to do so. And they overcome the usage of o2ib.

how I can prevent the usage of tcp interfaces on my OSS and clients side
giving priority to the o2ib interface ?


 lnetctl export | grep tcp
          tcp bonding: 0
    - net type: tcp
        - nid: 172.21.42.211 at tcp
          tcp bonding: 0
          tcp bonding: 0
*    - primary nid: 172.21.42.202 at tcp**
**        - nid: 172.21.42.202 at tcp*
    - primary nid: 172.21.42.213 at tcp
        - nid: 172.21.42.213 at tcp

so 172.21.42.202 at tcp is used instead of the infiniband interface, and
this is discovered automatically.

This is the configuration on my OSS where 172.21.42.213 is the MDS.

net:
    - net type: tcp
      local NI(s):
        - nid: 172.21.42.211 at tcp
          status: up
          interfaces:
              0: enp1s0f0
    - net type: o2ib
      local NI(s):
        - nid: 172.21.52.86 at o2ib
          status: up
          interfaces:
              0: ib0
peer:
    - primary nid: 172.21.42.213 at tcp
      Multi-Rail: False
      peer ni:
        - nid: 172.21.42.213 at tcp
          state: NA
    - primary nid:  172.21.52.126 at o2ib
      Multi-Rail: False
      peer ni:
        - nid: 172.21.52.126 at o2ib
          state: NA
    - primary nid:  172.21.52.127 at o2ib
      Multi-Rail: False
      peer ni:
        - nid: 172.21.52.127 at o2ib
          state: NA
    - primary nid:  172.21.52.128 at o2ib
      Multi-Rail: False
      peer ni:
        - nid: 172.21.52.128 at o2ib
          state: NA
    - primary nid:  172.21.52.129 at o2ib
      Multi-Rail: False
      peer ni:
        - nid: 172.21.52.129 at o2ib
          state: NA
    - primary nid:  172.21.52.130 at o2ib
      Multi-Rail: False
      peer ni:
        - nid: 172.21.52.130 at o2ib
          state: NA
    - primary nid:  172.21.52.131 at o2ib
      Multi-Rail: False
      peer ni:
        - nid: 172.21.52.131 at o2ib
          state: NA
global:
    numa_range: 0
    discovery: 0


thanks a lot


Rick


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180508/7ec6d5d1/attachment.html>


More information about the lustre-discuss mailing list