<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hello.</p>
    <p>Here I am again trying to have multi-rail work.</p>
    <p>I configured multi-rail on OSS and clients side.</p>
    <p>I have one OSS, one MDS and one client, RHEL74 and Lustre 2.10.1:<br>
    </p>
    <ul>
      <li>psdrp-tst-mds10 MDS<br>
      </li>
      <li>drp-tst-oss10 OSS  (172.21.52.86@o2ib  172.21.52.118@o2ib)<br>
      </li>
      <li>drp-tst-lu10 Lustre client (172.21.52.124@o2ib 
        172.21.52.125@o2ib)</li>
    </ul>
    <p>without Multi-Rail everything works fine.</p>
    <p>What I Am doing is to aggregate two IB interface to being able to
      have more performance. When anyway I mount the lustre partition
      from the Lsutre client I got this error and the partition does not
      mount:</p>
    <p>Oct  9 16:23:50 drp-tst-lu10 kernel: [248177.914832] LNetError:
      1895:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) 172.21.52.118@o2ib
      rejected: consumer defined fatal error<br>
      Oct  9 16:23:50 drp-tst-lu10 kernel: [248177.917290] Lustre:
      Mounted drplu-client<br>
      Oct  9 16:23:50 drp-tst-lu10 kernel: [248177.920832] Lustre:
      31785:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request
      sent has failed due to network error: [sent 1507591430/real
      1507591430]  req@ffff8807f56a0300 x1580812428378832/t0(0)
      o8-><a class="moz-txt-link-abbreviated" href="mailto:drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4">drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4</a>
      lens 520/544 e 0 to 1 dl 1507591435 ref 1 fl Rpc:eXN/0/ffffffff rc
      0/-1<br>
      Oct  9 16:23:52 drp-tst-lu10 kernel: [248179.936156] LustreError:
      673:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
      = -5<br>
      Oct  9 16:23:57 drp-tst-lu10 kernel: [248184.645463] LustreError:
      674:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
      = -5<br>
      Oct  9 16:23:58 drp-tst-lu10 kernel: [248186.117364] LustreError:
      678:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
      = -5<br>
      Oct  9 16:23:58 drp-tst-lu10 kernel: [248186.117411] LustreError:
      678:0:(llite_lib.c:1748:ll_statfs_internal()) Skipped 1 previous
      similar message<br>
      Oct  9 16:24:15 drp-tst-lu10 kernel: [248202.912554] LNetError:
      1895:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) 172.21.52.118@o2ib
      rejected: consumer defined fatal error<br>
      Oct  9 16:24:15 drp-tst-lu10 kernel: [248202.912610] LNetError:
      1895:0:(o2iblnd_cb.c:2726:kiblnd_rejected()) Skipped 3 previous
      similar messages<br>
      Oct  9 16:24:15 drp-tst-lu10 kernel: [248202.918903] Lustre:
      31785:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request
      sent has failed due to network error: [sent 1507591455/real
      1507591455]  req@ffff88075d2ee700 x1580812428378960/t0(0)
      o8-><a class="moz-txt-link-abbreviated" href="mailto:drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4">drplu-OST0001-osc-ffff88084738d800@172.21.52.86@o2ib:28/4</a>
      lens 520/544 e 0 to 1 dl 1507591465 ref 1 fl Rpc:eXN/0/ffffffff rc
      0/-1<br>
      Oct  9 16:23:52 drp-tst-lu10 kernel: [248179.936156] LustreError:
      673:0:(llite_lib.c:1748:ll_statfs_internal()) obd_statfs fails: rc
      = -5<br>
      <br>
    </p>
    <p>fstab entry: 172.21.42.213@tcp:/drplu /drplu lustre
      noauto,lazystatfs,flock, 0 0<br>
    </p>
    <p>I can see the peers in the lnet status:</p>
    <p>[root@drp-tst-oss10:~]# cat /proc/sys/lnet/peers <br>
      nid                      refs state  last   max   rtr   min   
      tx   min queue<br>
      172.21.52.124@o2ib          1    NA    -1   128   128   128  
      128   128 0<br>
      172.21.52.125@o2ib          1    NA    -1   128   128   128  
      128   128 0<br>
      172.21.42.213@tcp           1    NA    -1     8     8     8    
      8     6 0<br>
      <br>
      <br>
      <br>
      [root@drp-tst-lu10:etc]# cat /proc/sys/lnet/peers <br>
      nid                      refs state  last   max   rtr   min   
      tx   min queue<br>
      172.21.52.118@o2ib          1    NA    -1   128   128   128  
      128   127 0<br>
      172.21.52.86@o2ib           1    NA    -1   128   128   128  
      128   102 0<br>
      172.21.42.213@tcp           1    NA    -1     8     8     8    
      8     6 0<br>
    </p>
    <p><br>
    </p>
    <p>here is my lnet configuration with multi-rail on the OSS side<br>
    </p>
    <p><br>
    </p>
    <p>[root@drp-tst-oss10:veraldi]# lnetctl export<br>
      net:<br>
          - net type: lo<br>
            local NI(s):<br>
              - nid: 0@lo<br>
                status: up<br>
                statistics:<br>
                    send_count: 0<br>
                    recv_count: 0<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 0<br>
                    peer_credits: 0<br>
                    peer_buffer_credits: 0<br>
                    credits: 0<br>
                lnd tunables:<br>
                tcp bonding: 0<br>
                dev cpt: 0<br>
                CPT: "[0,1]"<br>
          - net type: o2ib<br>
            local NI(s):<br>
              - nid: 172.21.52.86@o2ib<br>
                status: up<br>
                interfaces:<br>
                    0: ib0<br>
                statistics:<br>
                    send_count: 0<br>
                    recv_count: 0<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 180<br>
                    peer_credits: 128<br>
                    peer_buffer_credits: 0<br>
                    credits: 1024<br>
                lnd tunables:<br>
                    peercredits_hiw: 64<br>
                    map_on_demand: 32<br>
                    concurrent_sends: 256<br>
                    fmr_pool_size: 2048<br>
                    fmr_flush_trigger: 512<br>
                    fmr_cache: 1<br>
                    ntx: 2048<br>
                    conns_per_peer: 4<br>
                tcp bonding: 0<br>
                dev cpt: 1<br>
                CPT: "[0,1]"<br>
              - nid: 172.21.52.118@o2ib<br>
                status: up<br>
                interfaces:<br>
                    0: ib1<br>
                statistics:<br>
                    send_count: 0<br>
                    recv_count: 0<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 180<br>
                    peer_credits: 128<br>
                    peer_buffer_credits: 0<br>
                    credits: 1024<br>
                lnd tunables:<br>
                    peercredits_hiw: 64<br>
                    map_on_demand: 32<br>
                    concurrent_sends: 256<br>
                    fmr_pool_size: 2048<br>
                    fmr_flush_trigger: 512<br>
                    fmr_cache: 1<br>
                    ntx: 2048<br>
                    conns_per_peer: 4<br>
                tcp bonding: 0<br>
                dev cpt: 1<br>
                CPT: "[0,1]"<br>
          - net type: tcp<br>
            local NI(s):<br>
              - nid: 172.21.42.211@tcp<br>
                status: up<br>
                interfaces:<br>
                    0: enp1s0f0<br>
                statistics:<br>
                    send_count: 198<br>
                    recv_count: 198<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 180<br>
                    peer_credits: 8<br>
                    peer_buffer_credits: 0<br>
                    credits: 256<br>
                lnd tunables:<br>
                tcp bonding: 0<br>
                dev cpt: 0<br>
                CPT: "[0,1]"<br>
      peer:<br>
          - primary nid: 172.21.42.213@tcp<br>
            Multi-Rail: True<br>
            peer ni:<br>
              - nid: 172.21.42.213@tcp<br>
                state: NA<br>
                max_ni_tx_credits: 8<br>
                available_tx_credits: 8<br>
                min_tx_credits: 6<br>
                tx_q_num_of_buf: 0<br>
                available_rtr_credits: 8<br>
                min_rtr_credits: 8<br>
                send_count: 198<br>
                recv_count: 198<br>
                drop_count: 0<br>
                refcount: 1<br>
          - primary nid: 172.21.52.124@o2ib<br>
            Multi-Rail: True<br>
            peer ni:<br>
              - nid: 172.21.52.124@o2ib<br>
                state: NA<br>
                max_ni_tx_credits: 128<br>
                available_tx_credits: 128<br>
                min_tx_credits: 128<br>
                tx_q_num_of_buf: 0<br>
                available_rtr_credits: 128<br>
                min_rtr_credits: 128<br>
                send_count: 0<br>
                recv_count: 0<br>
                drop_count: 0<br>
                refcount: 1<br>
              - nid: 172.21.52.125@o2ib<br>
                state: NA<br>
                max_ni_tx_credits: 128<br>
                available_tx_credits: 128<br>
                min_tx_credits: 128<br>
                tx_q_num_of_buf: 0<br>
                available_rtr_credits: 128<br>
                min_rtr_credits: 128<br>
                send_count: 0<br>
                recv_count: 0<br>
                drop_count: 0<br>
                refcount: 1<br>
      numa:<br>
          range: 0</p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p>here the lnet configuration client side:</p>
    <p><br>
    </p>
    <p>[root@drp-tst-lu10:veraldi]# lnetctl export<br>
      net:<br>
          - net type: lo<br>
            local NI(s):<br>
              - nid: 0@lo<br>
                status: up<br>
                statistics:<br>
                    send_count: 0<br>
                    recv_count: 0<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 0<br>
                    peer_credits: 0<br>
                    peer_buffer_credits: 0<br>
                    credits: 0<br>
                lnd tunables:<br>
                tcp bonding: 0<br>
                dev cpt: 0<br>
                CPT: "[0]"<br>
          - net type: o2ib<br>
            local NI(s):<br>
              - nid: 172.21.52.124@o2ib<br>
                status: up<br>
                interfaces:<br>
                    0: ib0<br>
                statistics:<br>
                    send_count: 403742<br>
                    recv_count: 807391<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 180<br>
                    peer_credits: 128<br>
                    peer_buffer_credits: 0<br>
                    credits: 1024<br>
                lnd tunables:<br>
                    peercredits_hiw: 64<br>
                    map_on_demand: 32<br>
                    concurrent_sends: 256<br>
                    fmr_pool_size: 2048<br>
                    fmr_flush_trigger: 512<br>
                    fmr_cache: 1<br>
                    ntx: 2048<br>
                    conns_per_peer: 4<br>
                tcp bonding: 0<br>
                dev cpt: -1<br>
                CPT: "[0]"<br>
              - nid: 172.21.52.125@o2ib<br>
                status: up<br>
                interfaces:<br>
                    0: ib1<br>
                statistics:<br>
                    send_count: 0<br>
                    recv_count: 0<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 180<br>
                    peer_credits: 128<br>
                    peer_buffer_credits: 0<br>
                    credits: 1024<br>
                lnd tunables:<br>
                    peercredits_hiw: 64<br>
                    map_on_demand: 32<br>
                    concurrent_sends: 256<br>
                    fmr_pool_size: 2048<br>
                    fmr_flush_trigger: 512<br>
                    fmr_cache: 1<br>
                    ntx: 2048<br>
                    conns_per_peer: 4<br>
                tcp bonding: 0<br>
                dev cpt: -1<br>
                CPT: "[0]"<br>
          - net type: tcp<br>
            local NI(s):<br>
              - nid: 172.21.42.195@tcp<br>
                status: up<br>
                interfaces:<br>
                    0: enp7s0f0<br>
                statistics:<br>
                    send_count: 99<br>
                    recv_count: 99<br>
                    drop_count: 0<br>
                tunables:<br>
                    peer_timeout: 180<br>
                    peer_credits: 8<br>
                    peer_buffer_credits: 0<br>
                    credits: 256<br>
                lnd tunables:<br>
                tcp bonding: 0<br>
                dev cpt: -1<br>
                CPT: "[0]"<br>
      peer:<br>
          - primary nid: 172.21.42.213@tcp<br>
            Multi-Rail: True<br>
            peer ni:<br>
              - nid: 172.21.42.213@tcp<br>
                state: NA<br>
                max_ni_tx_credits: 8<br>
                available_tx_credits: 8<br>
                min_tx_credits: 6<br>
                tx_q_num_of_buf: 0<br>
                available_rtr_credits: 8<br>
                min_rtr_credits: 8<br>
                send_count: 99<br>
                recv_count: 99<br>
                drop_count: 0<br>
                refcount: 1<br>
          - primary nid: 172.21.52.86@o2ib<br>
            Multi-Rail: True<br>
            peer ni:<br>
              - nid: 172.21.52.86@o2ib<br>
                state: NA<br>
                max_ni_tx_credits: 128<br>
                available_tx_credits: 128<br>
                min_tx_credits: 102<br>
                tx_q_num_of_buf: 0<br>
                available_rtr_credits: 128<br>
                min_rtr_credits: 128<br>
                send_count: 403742<br>
                recv_count: 807391<br>
                drop_count: 0<br>
                refcount: 1<br>
              - nid: 172.21.52.118@o2ib<br>
                state: NA<br>
                max_ni_tx_credits: 128<br>
                available_tx_credits: 128<br>
                min_tx_credits: 127<br>
                tx_q_num_of_buf: 0<br>
                available_rtr_credits: 128<br>
                min_rtr_credits: 128<br>
                send_count: 0<br>
                recv_count: 0<br>
                drop_count: 0<br>
                refcount: 1<br>
      numa:<br>
          range: 0</p>
    <p><br>
    </p>
    <p>anyway Lustre does not work. This is really weird. it should.</p>
    <p>Any hints ?</p>
    <p>thank you</p>
    <p><br>
    </p>
    <p>Rick</p>
    <p><br>
    </p>
  </body>
</html>