<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      I ran again my Lnet self test and  this time adding
      --concurrency=16  I can use all of the IB bandwith (3.5GB/sec).<br>
      <br>
      the only thing I do not understand is why ko2iblnd.conf is not
      loaded properly and I had to remove the alias in the config file
      to allow<br>
      the proper peer_credit settings to be loaded.<br>
      <br>
      thanks to everyone for helping<br>
      <br>
      Riccardo<br>
      <br>
      On 8/19/17 8:54 AM, Riccardo Veraldi wrote:<br>
    </div>
    <blockquote
      cite="mid:2f72090f-2e2e-af19-d3c4-eba790913d30@cnaf.infn.it"
      type="cite">
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
      <div class="moz-cite-prefix"><br>
        I found out that ko2iblnd is not getting settings from
        /etc/modprobe/ko2iblnd.conf<br>
        <tt>alias ko2iblnd-opa ko2iblnd</tt><tt><br>
        </tt><tt>options ko2iblnd-opa peer_credits=128
          peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048
          map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512
          fmr_cache=1 conns_per_peer=4</tt><tt><br>
        </tt><tt><br>
        </tt><tt>install ko2iblnd /usr/sbin/ko2iblnd-probe</tt><tt><br>
        </tt><br>
        but if I modify ko2iblnd.conf like this, then settings are
        loaded:<br>
        <br>
        <tt>options ko2iblnd peer_credits=128 peer_credits_hiw=64
          credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32
          fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
          conns_per_peer=4</tt><tt><br>
        </tt><tt><br>
        </tt><tt>install ko2iblnd /usr/sbin/ko2iblnd-probe</tt><br>
        <br>
        Lnet tests show better behaviour but still I Would expect more
        than this.<br>
        Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf
        so that Mellanox ConnectX-3 will work more efficiently ?<br>
        <br>
        [LNet Rates of servers]<br>
        [R] Avg: 2286     RPC/s Min: 0        RPC/s Max: 4572     RPC/s<br>
        [W] Avg: 3322     RPC/s Min: 0        RPC/s Max: 6643     RPC/s<br>
        [LNet Bandwidth of servers]<br>
        [R] Avg: 625.23   MiB/s Min: 0.00     MiB/s Max: 1250.46  MiB/s
        <br>
        [W] Avg: 1035.85  MiB/s Min: 0.00     MiB/s Max: 2071.69  MiB/s
        <br>
        [LNet Rates of servers]<br>
        [R] Avg: 2286     RPC/s Min: 1        RPC/s Max: 4571     RPC/s<br>
        [W] Avg: 3321     RPC/s Min: 1        RPC/s Max: 6641     RPC/s<br>
        [LNet Bandwidth of servers]<br>
        [R] Avg: 625.55   MiB/s Min: 0.00     MiB/s Max: 1251.11  MiB/s
        <br>
        [W] Avg: 1035.05  MiB/s Min: 0.00     MiB/s Max: 2070.11  MiB/s
        <br>
        [LNet Rates of servers]<br>
        [R] Avg: 2291     RPC/s Min: 0        RPC/s Max: 4581     RPC/s<br>
        [W] Avg: 3329     RPC/s Min: 0        RPC/s Max: 6657     RPC/s<br>
        [LNet Bandwidth of servers]<br>
        [R] Avg: 626.55   MiB/s Min: 0.00     MiB/s Max: 1253.11  MiB/s
        <br>
        [W] Avg: 1038.05  MiB/s Min: 0.00     MiB/s Max: 2076.11  MiB/s
        <br>
        session is ended<br>
        ./lnet_test.sh: line 17: 23394 Terminated              lst stat
        servers<br>
        <br>
        <br>
        <br>
        <br>
        On 8/19/17 4:20 AM, Arman Khalatyan wrote:<br>
      </div>
      <blockquote
cite="mid:CAAqDm6YrdsPzQ3=5LFgSBkSNgEdGLMtjhmD88tjw1cBsy2d7CA@mail.gmail.com"
        type="cite">
        <div dir="auto">just minor comment,
          <div dir="auto">you should push up performance of your
            nodes,they are not running in the max cpu frequencies.Al
            tests might be inconsistent. in order to get most of ib run
            following:</div>
          <div dir="auto">tuned-adm profile latency-performance</div>
          <div dir="auto">for more options use:</div>
          <div dir="auto">tuned-adm list</div>
          <div dir="auto"><br>
          </div>
          <div dir="auto">It will be interesting to see the difference.</div>
        </div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">Am 19.08.2017 3:57 vorm. schrieb
            "Riccardo Veraldi" <<a moz-do-not-send="true"
              href="mailto:Riccardo.Veraldi@cnaf.infn.it">Riccardo.Veraldi@cnaf.infn.it</a>>:<br
              type="attribution">
            <blockquote class="quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="m_-5514293233688902802moz-cite-prefix">Hello
                  Keith and Dennis, these are the test I ran.<br>
                  <br>
                  <ul>
                    <li>obdfilter-survey, shows that I Can saturate disk
                      performance, the NVMe/ZFS backend is performing
                      very well and it is faster then my Infiniband
                      network </li>
                  </ul>
                  <p><b><tt>pool          alloc   free   read  write  
                        read  write</tt></b><b><tt><br>
                      </tt></b><b><tt>------------  -----  -----  ----- 
                        -----  -----  -----</tt></b><b><tt><br>
                      </tt></b><b><tt>drpffb-ost01  3.31T  3.19T      3 
                        35.7K  16.0K  7.03G</tt></b><b><tt><br>
                      </tt></b><b><tt>  raidz1      3.31T  3.19T      3 
                        35.7K  16.0K  7.03G</tt></b><b><tt><br>
                      </tt></b><b><tt>    nvme0n1       -      -      1 
                        5.95K  7.99K  1.17G</tt></b><b><tt><br>
                      </tt></b><b><tt>    nvme1n1       -      -      0 
                        6.01K      0  1.18G</tt></b><b><tt><br>
                      </tt></b><b><tt>    nvme2n1       -      -      0 
                        5.93K      0  1.17G</tt></b><b><tt><br>
                      </tt></b><b><tt>    nvme3n1       -      -      0 
                        5.88K      0  1.16G</tt></b><b><tt><br>
                      </tt></b><b><tt>    nvme4n1       -      -      1 
                        5.95K  7.99K  1.17G</tt></b><b><tt><br>
                      </tt></b><b><tt>    nvme5n1       -      -      0 
                        5.96K      0  1.17G</tt></b><b><tt><br>
                      </tt></b><b><tt>------------  -----  -----  ----- 
                        -----  -----  -----</tt></b><br>
                  </p>
                  this are the tests results<br>
                  <br>
                  <tt>Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for
                    case=disk from drp-tst-ffb01</tt><tt><br>
                  </tt><tt>ost  1 sz 10485760K rsz 1024K obj    1 thr   
                    1 write<b> 7633.08   </b>          SHORT rewrite
                    7558.78             SHORT read 3205.24 [3213.70,
                    3226.78] </tt><tt><br>
                  </tt><tt>ost  1 sz 10485760K rsz 1024K obj    1 thr   
                    2 write<b> 7996.89 </b>            SHORT rewrite
                    7903.42             SHORT read 5264.70            
                    SHORT </tt><tt><br>
                  </tt><tt>ost  1 sz 10485760K rsz 1024K obj    2 thr   
                    2 write <b>7718.94</b>             SHORT rewrite
                    7977.84             SHORT read 5802.17            
                    SHORT </tt><tt><br>
                  </tt><br>
                  <ul>
                    <li>Lnet self test, and here I see the problems. For
                      reference 172.21.52.[83,84] are the two OSSes
                      172.21.52.86 is the reader/writer. Here is the
                      script that I ran</li>
                  </ul>
                  <p><tt>#!/bin/bash</tt><tt><br>
                    </tt><tt>export LST_SESSION=$$</tt><tt><br>
                    </tt><tt>lst new_session read_write</tt><tt><br>
                    </tt><tt>lst add_group servers
                      172.21.52.[83,84]@o2ib5</tt><tt><br>
                    </tt><tt>lst add_group readers 172.21.52.86@o2ib5</tt><tt><br>
                    </tt><tt>lst add_group writers 172.21.52.86@o2ib5</tt><tt><br>
                    </tt><tt>lst add_batch bulk_rw</tt><tt><br>
                    </tt><tt>lst add_test --batch bulk_rw --from readers
                      --to servers \</tt><tt><br>
                    </tt><tt>brw read check=simple size=1M</tt><tt><br>
                    </tt><tt>lst add_test --batch bulk_rw --from writers
                      --to servers \</tt><tt><br>
                    </tt><tt>brw write check=full size=1M</tt><tt><br>
                    </tt><tt># start running</tt><tt><br>
                    </tt><tt>lst run bulk_rw</tt><tt><br>
                    </tt><tt># display server stats for 30 seconds</tt><tt><br>
                    </tt><tt>lst stat servers & sleep 30; kill $!</tt><tt><br>
                    </tt><tt># tear down</tt><tt><br>
                    </tt><tt>lst end_session</tt><br>
                  </p>
                  <p><br>
                  </p>
                  <p>here the results<br>
                  </p>
                  <p><tt>SESSION: read_write FEATURES: 1 TIMEOUT: 300
                      FORCE: No</tt><tt><br>
                    </tt><tt>172.21.52.[83,84]@o2ib5 are added to
                      session</tt><tt><br>
                    </tt><tt>172.21.52.86@o2ib5 are added to session</tt><tt><br>
                    </tt><tt>172.21.52.86@o2ib5 are added to session</tt><tt><br>
                    </tt><tt>Test was added successfully</tt><tt><br>
                    </tt><tt>Test was added successfully</tt><tt><br>
                    </tt><tt>bulk_rw is running now</tt><tt><br>
                    </tt><tt>[LNet Rates of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 1751     RPC/s Min: 0        RPC/s
                      Max: 3502     RPC/s</tt><tt><br>
                    </tt><tt>[W] Avg: 2525     RPC/s Min: 0        RPC/s
                      Max: 5050     RPC/s</tt><tt><br>
                    </tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 488.79   MiB/s Min: 0.00     MiB/s
                      Max: 977.59   MiB/s </tt><tt><br>
                    </tt><tt>[W] Avg: 773.99   MiB/s Min: 0.00     MiB/s
                      Max: 1547.99  MiB/s </tt><tt><br>
                    </tt><tt>[LNet Rates of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 1718     RPC/s Min: 0        RPC/s
                      Max: 3435     RPC/s</tt><tt><br>
                    </tt><tt>[W] Avg: 2479     RPC/s Min: 0        RPC/s
                      Max: 4958     RPC/s</tt><tt><br>
                    </tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 478.19   MiB/s Min: 0.00     MiB/s
                      Max: 956.39   MiB/s </tt><tt><br>
                    </tt><tt>[W] Avg: 761.74   MiB/s Min: 0.00     MiB/s
                      Max: 1523.47  MiB/s </tt><tt><br>
                    </tt><tt>[LNet Rates of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 1734     RPC/s Min: 0        RPC/s
                      Max: 3467     RPC/s</tt><tt><br>
                    </tt><tt>[W] Avg: 2506     RPC/s Min: 0        RPC/s
                      Max: 5012     RPC/s</tt><tt><br>
                    </tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 480.79   MiB/s Min: 0.00     MiB/s
                      Max: 961.58   MiB/s </tt><tt><br>
                    </tt><tt>[W] Avg: 772.49   MiB/s Min: 0.00     MiB/s
                      Max: 1544.98  MiB/s </tt><tt><br>
                    </tt><tt>[LNet Rates of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 1722     RPC/s Min: 0        RPC/s
                      Max: 3444     RPC/s</tt><tt><br>
                    </tt><tt>[W] Avg: 2486     RPC/s Min: 0        RPC/s
                      Max: 4972     RPC/s</tt><tt><br>
                    </tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 479.09   MiB/s Min: 0.00     MiB/s
                      Max: 958.18   MiB/s </tt><tt><br>
                    </tt><tt>[W] Avg: 764.19   MiB/s Min: 0.00     MiB/s
                      Max: 1528.38  MiB/s </tt><tt><br>
                    </tt><tt>[LNet Rates of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 1741     RPC/s Min: 0        RPC/s
                      Max: 3482     RPC/s</tt><tt><br>
                    </tt><tt>[W] Avg: 2513     RPC/s Min: 0        RPC/s
                      Max: 5025     RPC/s</tt><tt><br>
                    </tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
                    </tt><tt>[R] Avg: 484.59   MiB/s Min: 0.00     MiB/s
                      Max: 969.19   MiB/s </tt><tt><br>
                    </tt><tt>[W] Avg: 771.94   MiB/s Min: 0.00     MiB/s
                      Max: 1543.87  MiB/s </tt><tt><br>
                    </tt><tt>session is ended</tt><tt><br>
                    </tt><tt>./lnet_test.sh: line 17:  4940
                      Terminated              lst stat servers</tt><tt><br>
                    </tt><br>
                  </p>
                  so looks like Lnet is really under performing  going
                  at least half and less than InfiniBand capabilities.<br>
                  How can I find out what is causing this ?
                  <p>running perf tools tests with infiniband tools I
                    have good results:</p>
                  <p><tt><br>
                    </tt></p>
                  <p><tt>******************************<wbr>******</tt><tt><br>
                    </tt><tt>* Waiting for client to connect... *</tt><tt><br>
                    </tt><tt>******************************<wbr>******</tt><tt><br>
                    </tt><tt><br>
                    </tt><tt>------------------------------<wbr>------------------------------<wbr>---------------------------</tt><tt><br>
                    </tt><tt>                    Send BW Test</tt><tt><br>
                    </tt><tt> Dual-port       : OFF       
                      Device         : mlx4_0</tt><tt><br>
                    </tt><tt> Number of qps   : 1        Transport type
                      : IB</tt><tt><br>
                    </tt><tt> Connection type : RC        Using SRQ     
                      : OFF</tt><tt><br>
                    </tt><tt> RX depth        : 512</tt><tt><br>
                    </tt><tt> CQ Moderation   : 100</tt><tt><br>
                    </tt><tt> Mtu             : 2048[B]</tt><tt><br>
                    </tt><tt> Link type       : IB</tt><tt><br>
                    </tt><tt> Max inline data : 0[B]</tt><tt><br>
                    </tt><tt> rdma_cm QPs     : OFF</tt><tt><br>
                    </tt><tt> Data ex. method : Ethernet</tt><tt><br>
                    </tt><tt>------------------------------<wbr>------------------------------<wbr>---------------------------</tt><tt><br>
                    </tt><tt> local address: LID 0x07 QPN 0x020f PSN
                      0xacc37a</tt><tt><br>
                    </tt><tt> remote address: LID 0x0a QPN 0x020f PSN
                      0x91a069</tt><tt><br>
                    </tt><tt>------------------------------<wbr>------------------------------<wbr>---------------------------</tt><tt><br>
                    </tt><tt> #bytes     #iterations    BW
                      peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1249.234000 != 1326.000000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 2          1000            
                      0.00               11.99             6.285330</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1314.910000 != 1395.460000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 4          1000            
                      0.00               28.26             7.409324</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1314.910000 != 1460.207000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 8          1000            
                      0.00               54.47             7.139164</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1314.910000 != 1244.320000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 16         1000            
                      0.00               113.13            7.413889</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1314.910000 != 1460.207000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 32         1000            
                      0.00               226.07            7.407811</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1469.703000 != 1301.031000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 64         1000            
                      0.00               452.12            7.407465</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1469.703000 != 1301.031000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 128        1000            
                      0.00               845.45            6.925918</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1469.703000 != 1362.257000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 256        1000            
                      0.00               1746.93           7.155406</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1469.703000 != 1362.257000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 512        1000            
                      0.00               2766.93           5.666682</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1296.714000 != 1204.675000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 1024       1000            
                      0.00               3516.26           3.600646</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1296.714000 != 1325.535000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 2048       1000            
                      0.00               3630.93           1.859035</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1296.714000 != 1331.312000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 4096       1000            
                      0.00               3702.39           0.947813</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1296.714000 != 1200.027000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 8192       1000            
                      0.00               3724.82           0.476777</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1384.902000 != 1314.113000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 16384      1000            
                      0.00               3731.21           0.238798</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1578.078000 != 1200.027000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 32768      1000            
                      0.00               3735.32           0.119530</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1578.078000 != 1200.027000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 65536      1000            
                      0.00               3736.98           0.059792</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1578.078000 != 1200.027000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 131072     1000            
                      0.00               3737.80           0.029902</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1578.078000 != 1200.027000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 262144     1000            
                      0.00               3738.43           0.014954</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1570.507000 != 1200.027000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 524288     1000            
                      0.00               3738.50           0.007477</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1457.019000 != 1236.152000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 1048576    1000            
                      0.00               3738.65           0.003739</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1411.597000 != 1234.957000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 2097152    1000            
                      0.00               3738.65           0.001869</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1369.828000 != 1516.851000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 4194304    1000            
                      0.00               3738.80           0.000935</tt><tt><br>
                    </tt><tt>Conflicting CPU frequency values detected:
                      1564.664000 != 1247.574000. CPU Frequency is not
                      max.</tt><tt><br>
                    </tt><tt> 8388608    1000            
                      0.00               3738.76           0.000467</tt><tt><br>
                    </tt><tt>------------------------------<wbr>------------------------------<wbr>---------------------------</tt><tt><br>
                    </tt><tt><br>
                    </tt></p>
                  <p><tt>RDMA modules are loaded</tt><tt><br>
                    </tt><tt><br>
                    </tt><tt>rpcrdma                90366  0 </tt><tt><br>
                    </tt><tt>rdma_ucm               26837  0 </tt><tt><br>
                    </tt><tt>ib_uverbs              51854  2
                      ib_ucm,rdma_ucm</tt><tt><br>
                    </tt><tt>rdma_cm                53755  5
                      rpcrdma,ko2iblnd,ib_iser,rdma_<wbr>ucm,ib_isert</tt><tt><br>
                    </tt><tt>ib_cm                  47149  5
                      rdma_cm,ib_srp,ib_ucm,ib_srpt,<wbr>ib_ipoib</tt><tt><br>
                    </tt><tt>iw_cm                  46022  1 rdma_cm</tt><tt><br>
                    </tt><tt>ib_core               210381  15
                      rdma_cm,ib_cm,iw_cm,rpcrdma,<wbr>ko2iblnd,mlx4_ib,ib_srp,ib_<wbr>ucm,ib_iser,ib_srpt,ib_umad,<wbr>ib_uverbs,rdma_ucm,ib_ipoib,<wbr>ib_isert</tt><tt><br>
                    </tt><tt>sunrpc                334343  17
                      nfs,nfsd,rpcsec_gss_krb5,auth_<wbr>rpcgss,lockd,nfsv4,rpcrdma,<wbr>nfs_acl</tt><tt><br>
                    </tt></p>
                  <p>I do not know where to look to have Lnet performing
                    faster. I am running my ib0 interface in connected
                    mode with 65520 MTU size.</p>
                  <p>Any hint will be much appreciated</p>
                  <p>thank you</p>
                  <p>Rick</p>
                  <div class="quoted-text">
                    <p><br>
                    </p>
                    <p><br>
                    </p>
                    <p><br>
                    </p>
                    On 8/18/17 9:05 AM, Mannthey, Keith wrote:<br>
                  </div>
                </div>
                <div class="elided-text">
                  <blockquote type="cite">
                    <pre>I would suggest you a few other tests to help isolate where the issue might be.  

1. What is the single thread "DD" write speed?
 
2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network Performance (LNet Self-Test)" in the Lustre manual if this is a new test for you. 
This will help show how much Lnet bandwith you have from your single client.  There are tunable in the lnet later that can affect things.  Which QRD HCA are you using?

3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  

Thanks,
 Keith 
-----Original Message-----
From: lustre-discuss [<a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-freetext" href="mailto:lustre-discuss-bounces@lists.lustre.org" target="_blank">mailto:lustre-discuss-<wbr>bounces@lists.lustre.org</a>] On Behalf Of Riccardo Veraldi
Sent: Thursday, August 17, 2017 10:48 PM
To: Dennis Nelson <a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-rfc2396E" href="mailto:dnelson@ddn.com" target="_blank"><dnelson@ddn.com></a>; <a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.<wbr>org</a>
Subject: Re: [lustre-discuss] Lustre poor performance

this is my lustre.conf

[drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet networks=o2ib5(ib0),tcp5(<wbr>enp1s0f0)

data transfer is over infiniband

ib0: flags=4163<UP,BROADCAST,<wbr>RUNNING,MULTICAST>  mtu 65520
        inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255


On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
</pre>
      <blockquote type="cite">
        <pre>On 8/17/17 9:22 PM, Dennis Nelson wrote:
</pre>
        <blockquote type="cite">
          <pre>It appears that you are running iozone on a single client?  What kind of network is tcp5?  Have you looked at the network to make sure it is not the bottleneck?

</pre>
        </blockquote>
        <pre>yes the data transfer is on ib0 interface and I did a memory to memory 
test through InfiniBand QDR  resulting in 3.7GB/sec.
tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
my other many Lustre clusters. I could have called it tcp but it does 
not make any difference performance wise.
I ran the test from one single node yes, I ran the same test also 
locally on a zpool identical to the one on the Lustre OSS.
 Ihave 4 identical servers each of them with the aame nvme disks:

server1: OSS - OST1 Lustre/ZFS  raidz1

server2: OSS - OST2 Lustre/ZFS  raidz1

server3: local ZFS raidz1

server4: Lustre client



______________________________<wbr>_________________
lustre-discuss mailing list
<a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.<wbr>org</a>
<a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" target="_blank">http://lists.lustre.org/<wbr>listinfo.cgi/lustre-discuss-<wbr>lustre.org</a>
</pre>
      </blockquote>
      <pre>______________________________<wbr>_________________
lustre-discuss mailing list
<a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.<wbr>org</a>
<a moz-do-not-send="true" class="m_-5514293233688902802moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" target="_blank">http://lists.lustre.org/<wbr>listinfo.cgi/lustre-discuss-<wbr>lustre.org</a>

</pre>
    </blockquote>
    <p>

    </p>
  </div></div>


______________________________<wbr>_________________

lustre-discuss mailing list

<a moz-do-not-send="true" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.<wbr>org</a>

<a moz-do-not-send="true" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">http://lists.lustre.org/<wbr>listinfo.cgi/lustre-discuss-<wbr>lustre.org</a>


</blockquote></div>
</div>



</blockquote><p>
</p>


</blockquote><p>
</p></body></html>