<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 8/18/17 7:05 PM, Dennis Nelson
wrote:<br>
</div>
<blockquote cite="mid:1F3A28F9-C3C5-4B80-B0A3-AD5CDB31F62A@ddn.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div>If all four servers are identical and all have IB, why are
you specifying tcp when mounting the client?<br>
</div>
</blockquote>
because the MDS does not have InfiniBand but just ethernet
connection. Only the OSSes have Infiniband on ib0 interface.<br>
<br>
this is my ldev.conf<br>
<br>
psdrp-tst-mds01 - mgs zfs:drpffb-mgs/mgs<br>
psdrp-tst-mds01 - mdt0 zfs:drpffb-mdt0/mdt0<br>
#<br>
drp-tst-ffb01 - OST01 zfs:drpffb-ost01/ost01<br>
drp-tst-ffb02 - OST02 zfs:drpffb-ost02/ost02<br>
<br>
this is my lustre.conf on the OSSes and Lustre client<br>
<br>
options lnet networks=o2ib5(ib0),tcp5(enp1s0f0)<br>
<br>
this is my lustre.conf on the MDS<br>
<br>
options lnet networks=tcp5(eth0)<br>
<br>
<br>
<br>
<br>
<br>
<br>
<blockquote cite="mid:1F3A28F9-C3C5-4B80-B0A3-AD5CDB31F62A@ddn.com"
type="cite">
<div>
<br>
Sent from my iPhone</div>
<div><br>
On Aug 18, 2017, at 8:57 PM, Riccardo Veraldi <<a
moz-do-not-send="true"
href="mailto:Riccardo.Veraldi@cnaf.infn.it">Riccardo.Veraldi@cnaf.infn.it</a>>
wrote:<br>
<br>
</div>
<blockquote type="cite">
<div>
<div class="moz-cite-prefix">Hello Keith and Dennis, these are
the test I ran.<br>
<br>
<ul>
<li>obdfilter-survey, shows that I Can saturate disk
performance, the NVMe/ZFS backend is performing very
well and it is faster then my Infiniband network
</li>
</ul>
<p><b><tt>pool alloc free read write read
write</tt></b><b><tt><br>
</tt></b><b><tt>------------ ----- ----- -----
----- ----- -----</tt></b><b><tt><br>
</tt></b><b><tt>drpffb-ost01 3.31T 3.19T 3
35.7K 16.0K 7.03G</tt></b><b><tt><br>
</tt></b><b><tt> raidz1 3.31T 3.19T 3
35.7K 16.0K 7.03G</tt></b><b><tt><br>
</tt></b><b><tt> nvme0n1 - - 1
5.95K 7.99K 1.17G</tt></b><b><tt><br>
</tt></b><b><tt> nvme1n1 - - 0
6.01K 0 1.18G</tt></b><b><tt><br>
</tt></b><b><tt> nvme2n1 - - 0
5.93K 0 1.17G</tt></b><b><tt><br>
</tt></b><b><tt> nvme3n1 - - 0
5.88K 0 1.16G</tt></b><b><tt><br>
</tt></b><b><tt> nvme4n1 - - 1
5.95K 7.99K 1.17G</tt></b><b><tt><br>
</tt></b><b><tt> nvme5n1 - - 0
5.96K 0 1.17G</tt></b><b><tt><br>
</tt></b><b><tt>------------ ----- ----- -----
----- ----- -----</tt></b><br>
</p>
this are the tests results<br>
<br>
<tt>Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for
case=disk from drp-tst-ffb01</tt><tt><br>
</tt><tt>ost 1 sz 10485760K rsz 1024K obj 1 thr 1
write<b> 7633.08 </b> SHORT rewrite
7558.78 SHORT read 3205.24 [3213.70, 3226.78]
</tt><tt><br>
</tt><tt>ost 1 sz 10485760K rsz 1024K obj 1 thr 2
write<b> 7996.89 </b> SHORT rewrite
7903.42 SHORT read 5264.70 SHORT
</tt><tt><br>
</tt><tt>ost 1 sz 10485760K rsz 1024K obj 2 thr 2
write <b>7718.94</b> SHORT rewrite
7977.84 SHORT read 5802.17 SHORT
</tt><tt><br>
</tt><br>
<ul>
<li>Lnet self test, and here I see the problems. For
reference 172.21.52.[83,84] are the two OSSes
172.21.52.86 is the reader/writer. Here is the script
that I ran
</li>
</ul>
<p><tt>#!/bin/bash</tt><tt><br>
</tt><tt>export LST_SESSION=$$</tt><tt><br>
</tt><tt>lst new_session read_write</tt><tt><br>
</tt><tt>lst add_group servers 172.21.52.[83,84]@o2ib5</tt><tt><br>
</tt><tt>lst add_group readers 172.21.52.86@o2ib5</tt><tt><br>
</tt><tt>lst add_group writers 172.21.52.86@o2ib5</tt><tt><br>
</tt><tt>lst add_batch bulk_rw</tt><tt><br>
</tt><tt>lst add_test --batch bulk_rw --from readers --to
servers \</tt><tt><br>
</tt><tt>brw read check=simple size=1M</tt><tt><br>
</tt><tt>lst add_test --batch bulk_rw --from writers --to
servers \</tt><tt><br>
</tt><tt>brw write check=full size=1M</tt><tt><br>
</tt><tt># start running</tt><tt><br>
</tt><tt>lst run bulk_rw</tt><tt><br>
</tt><tt># display server stats for 30 seconds</tt><tt><br>
</tt><tt>lst stat servers & sleep 30; kill $!</tt><tt><br>
</tt><tt># tear down</tt><tt><br>
</tt><tt>lst end_session</tt><br>
</p>
<p><br>
</p>
<p>here the results<br>
</p>
<p><tt>SESSION: read_write FEATURES: 1 TIMEOUT: 300 FORCE:
No</tt><tt><br>
</tt><tt>172.21.52.[83,84]@o2ib5 are added to session</tt><tt><br>
</tt><tt>172.21.52.86@o2ib5 are added to session</tt><tt><br>
</tt><tt>172.21.52.86@o2ib5 are added to session</tt><tt><br>
</tt><tt>Test was added successfully</tt><tt><br>
</tt><tt>Test was added successfully</tt><tt><br>
</tt><tt>bulk_rw is running now</tt><tt><br>
</tt><tt>[LNet Rates of servers]</tt><tt><br>
</tt><tt>[R] Avg: 1751 RPC/s Min: 0 RPC/s Max:
3502 RPC/s</tt><tt><br>
</tt><tt>[W] Avg: 2525 RPC/s Min: 0 RPC/s Max:
5050 RPC/s</tt><tt><br>
</tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
</tt><tt>[R] Avg: 488.79 MiB/s Min: 0.00 MiB/s Max:
977.59 MiB/s </tt><tt><br>
</tt><tt>[W] Avg: 773.99 MiB/s Min: 0.00 MiB/s Max:
1547.99 MiB/s </tt><tt><br>
</tt><tt>[LNet Rates of servers]</tt><tt><br>
</tt><tt>[R] Avg: 1718 RPC/s Min: 0 RPC/s Max:
3435 RPC/s</tt><tt><br>
</tt><tt>[W] Avg: 2479 RPC/s Min: 0 RPC/s Max:
4958 RPC/s</tt><tt><br>
</tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
</tt><tt>[R] Avg: 478.19 MiB/s Min: 0.00 MiB/s Max:
956.39 MiB/s </tt><tt><br>
</tt><tt>[W] Avg: 761.74 MiB/s Min: 0.00 MiB/s Max:
1523.47 MiB/s </tt><tt><br>
</tt><tt>[LNet Rates of servers]</tt><tt><br>
</tt><tt>[R] Avg: 1734 RPC/s Min: 0 RPC/s Max:
3467 RPC/s</tt><tt><br>
</tt><tt>[W] Avg: 2506 RPC/s Min: 0 RPC/s Max:
5012 RPC/s</tt><tt><br>
</tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
</tt><tt>[R] Avg: 480.79 MiB/s Min: 0.00 MiB/s Max:
961.58 MiB/s </tt><tt><br>
</tt><tt>[W] Avg: 772.49 MiB/s Min: 0.00 MiB/s Max:
1544.98 MiB/s </tt><tt><br>
</tt><tt>[LNet Rates of servers]</tt><tt><br>
</tt><tt>[R] Avg: 1722 RPC/s Min: 0 RPC/s Max:
3444 RPC/s</tt><tt><br>
</tt><tt>[W] Avg: 2486 RPC/s Min: 0 RPC/s Max:
4972 RPC/s</tt><tt><br>
</tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
</tt><tt>[R] Avg: 479.09 MiB/s Min: 0.00 MiB/s Max:
958.18 MiB/s </tt><tt><br>
</tt><tt>[W] Avg: 764.19 MiB/s Min: 0.00 MiB/s Max:
1528.38 MiB/s </tt><tt><br>
</tt><tt>[LNet Rates of servers]</tt><tt><br>
</tt><tt>[R] Avg: 1741 RPC/s Min: 0 RPC/s Max:
3482 RPC/s</tt><tt><br>
</tt><tt>[W] Avg: 2513 RPC/s Min: 0 RPC/s Max:
5025 RPC/s</tt><tt><br>
</tt><tt>[LNet Bandwidth of servers]</tt><tt><br>
</tt><tt>[R] Avg: 484.59 MiB/s Min: 0.00 MiB/s Max:
969.19 MiB/s </tt><tt><br>
</tt><tt>[W] Avg: 771.94 MiB/s Min: 0.00 MiB/s Max:
1543.87 MiB/s </tt><tt><br>
</tt><tt>session is ended</tt><tt><br>
</tt><tt>./lnet_test.sh: line 17: 4940
Terminated lst stat servers</tt><tt><br>
</tt><br>
</p>
so looks like Lnet is really under performing going at
least half and less than InfiniBand capabilities.<br>
How can I find out what is causing this ?
<p>running perf tools tests with infiniband tools I have
good results:</p>
<p><tt><br>
</tt></p>
<p><tt>************************************</tt><tt><br>
</tt><tt>* Waiting for client to connect... *</tt><tt><br>
</tt><tt>************************************</tt><tt><br>
</tt><tt><br>
</tt><tt>---------------------------------------------------------------------------------------</tt><tt><br>
</tt><tt> Send BW Test</tt><tt><br>
</tt><tt> Dual-port : OFF Device :
mlx4_0</tt><tt><br>
</tt><tt> Number of qps : 1 Transport type : IB</tt><tt><br>
</tt><tt> Connection type : RC Using SRQ : OFF</tt><tt><br>
</tt><tt> RX depth : 512</tt><tt><br>
</tt><tt> CQ Moderation : 100</tt><tt><br>
</tt><tt> Mtu : 2048[B]</tt><tt><br>
</tt><tt> Link type : IB</tt><tt><br>
</tt><tt> Max inline data : 0[B]</tt><tt><br>
</tt><tt> rdma_cm QPs : OFF</tt><tt><br>
</tt><tt> Data ex. method : Ethernet</tt><tt><br>
</tt><tt>---------------------------------------------------------------------------------------</tt><tt><br>
</tt><tt> local address: LID 0x07 QPN 0x020f PSN 0xacc37a</tt><tt><br>
</tt><tt> remote address: LID 0x0a QPN 0x020f PSN 0x91a069</tt><tt><br>
</tt><tt>---------------------------------------------------------------------------------------</tt><tt><br>
</tt><tt> #bytes #iterations BW peak[MB/sec] BW
average[MB/sec] MsgRate[Mpps]</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1249.234000 != 1326.000000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 2 1000 0.00
11.99 6.285330</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1314.910000 != 1395.460000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 4 1000 0.00
28.26 7.409324</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1314.910000 != 1460.207000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 8 1000 0.00
54.47 7.139164</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1314.910000 != 1244.320000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 16 1000 0.00
113.13 7.413889</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1314.910000 != 1460.207000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 32 1000 0.00
226.07 7.407811</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1469.703000 != 1301.031000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 64 1000 0.00
452.12 7.407465</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1469.703000 != 1301.031000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 128 1000 0.00
845.45 6.925918</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1469.703000 != 1362.257000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 256 1000 0.00
1746.93 7.155406</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1469.703000 != 1362.257000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 512 1000 0.00
2766.93 5.666682</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1296.714000 != 1204.675000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 1024 1000 0.00
3516.26 3.600646</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1296.714000 != 1325.535000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 2048 1000 0.00
3630.93 1.859035</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1296.714000 != 1331.312000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 4096 1000 0.00
3702.39 0.947813</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1296.714000 != 1200.027000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 8192 1000 0.00
3724.82 0.476777</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1384.902000 != 1314.113000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 16384 1000 0.00
3731.21 0.238798</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1578.078000 != 1200.027000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 32768 1000 0.00
3735.32 0.119530</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1578.078000 != 1200.027000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 65536 1000 0.00
3736.98 0.059792</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1578.078000 != 1200.027000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 131072 1000 0.00
3737.80 0.029902</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1578.078000 != 1200.027000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 262144 1000 0.00
3738.43 0.014954</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1570.507000 != 1200.027000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 524288 1000 0.00
3738.50 0.007477</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1457.019000 != 1236.152000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 1048576 1000 0.00
3738.65 0.003739</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1411.597000 != 1234.957000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 2097152 1000 0.00
3738.65 0.001869</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1369.828000 != 1516.851000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 4194304 1000 0.00
3738.80 0.000935</tt><tt><br>
</tt><tt>Conflicting CPU frequency values detected:
1564.664000 != 1247.574000. CPU Frequency is not max.</tt><tt><br>
</tt><tt> 8388608 1000 0.00
3738.76 0.000467</tt><tt><br>
</tt><tt>---------------------------------------------------------------------------------------</tt><tt><br>
</tt><tt><br>
</tt></p>
<p><tt>RDMA modules are loaded</tt><tt><br>
</tt><tt><br>
</tt><tt>rpcrdma 90366 0 </tt><tt><br>
</tt><tt>rdma_ucm 26837 0 </tt><tt><br>
</tt><tt>ib_uverbs 51854 2 ib_ucm,rdma_ucm</tt><tt><br>
</tt><tt>rdma_cm 53755 5
rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert</tt><tt><br>
</tt><tt>ib_cm 47149 5
rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib</tt><tt><br>
</tt><tt>iw_cm 46022 1 rdma_cm</tt><tt><br>
</tt><tt>ib_core 210381 15
rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert</tt><tt><br>
</tt><tt>sunrpc 334343 17
nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl</tt><tt><br>
</tt></p>
<p>I do not know where to look to have Lnet performing
faster. I am running my ib0 interface in connected mode
with 65520 MTU size.</p>
<p>Any hint will be much appreciated</p>
<p>thank you</p>
<p>Rick</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
On 8/18/17 9:05 AM, Mannthey, Keith wrote:<br>
</div>
<blockquote
cite="mid:E8BCA7842FE64F499E796131B007C2A1998EF7C7@FMSMSX114.amr.corp.intel.com"
type="cite">
<pre wrap="">I would suggest you a few other tests to help isolate where the issue might be.
1. What is the single thread "DD" write speed?
2. Lnet_selfttest: Please see " Chapter 28. Testing Lustre Network Performance (LNet Self-Test)" in the Lustre manual if this is a new test for you.
This will help show how much Lnet bandwith you have from your single client. There are tunable in the lnet later that can affect things. Which QRD HCA are you using?
3. OBDFilter_survey : Please see " 29.3. Testing OST Performance (obdfilter-survey)" in the Lustre manual. This test will help demonstrate what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.
Thanks,
Keith
-----Original Message-----
From: lustre-discuss [<a moz-do-not-send="true" class="moz-txt-link-freetext" href="mailto:lustre-discuss-bounces@lists.lustre.org">mailto:lustre-discuss-bounces@lists.lustre.org</a>] On Behalf Of Riccardo Veraldi
Sent: Thursday, August 17, 2017 10:48 PM
To: Dennis Nelson <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:dnelson@ddn.com"><dnelson@ddn.com></a>; <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
Subject: Re: [lustre-discuss] Lustre poor performance
this is my lustre.conf
[drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet networks=o2ib5(ib0),tcp5(enp1s0f0)
data transfer is over infiniband
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520
inet 172.21.52.83 netmask 255.255.252.0 broadcast 172.21.55.255
On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
</pre>
<blockquote type="cite">
<pre wrap="">On 8/17/17 9:22 PM, Dennis Nelson wrote:
</pre>
<blockquote type="cite">
<pre wrap="">It appears that you are running iozone on a single client? What kind of network is tcp5? Have you looked at the network to make sure it is not the bottleneck?
</pre>
</blockquote>
<pre wrap="">yes the data transfer is on ib0 interface and I did a memory to memory
test through InfiniBand QDR resulting in 3.7GB/sec.
tcp is used to connect to the MDS. It is tcp5 to differentiate it from
my other many Lustre clusters. I could have called it tcp but it does
not make any difference performance wise.
I ran the test from one single node yes, I ran the same test also
locally on a zpool identical to the one on the Lustre OSS.
Ihave 4 identical servers each of them with the aame nvme disks:
server1: OSS - OST1 Lustre/ZFS raidz1
server2: OSS - OST2 Lustre/ZFS raidz1
server3: local ZFS raidz1
server4: Lustre client
_______________________________________________
lustre-discuss mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
</pre>
</blockquote>
<pre wrap="">_______________________________________________
lustre-discuss mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
</pre>
</blockquote>
<p><br>
</p>
</div>
</blockquote>
</blockquote>
<p><br>
</p>
</body>
</html>