<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body text="#000000" bgcolor="#FFFFFF">


<div class="moz-cite-prefix">Thanks for the pointers.<br>


<br>


Flow control has limited impact at this point (no change under lnet_selftest and ~10% drop when disabled under iperf).<br>


All machines have tcp_sack enabled.<br>


Checksum don't seems to make a difference either.<br>


Bumping up the max_rpc_in_flights didn't improve much but seems to have made the write speed more consistent.<br>


read_ahead had no effect on read performance.<br>


<br>


At this point I am struggling to understand what has actual effects on reads.<br>


iperf between clients and OSS gives a combined bandwidth that reach ~90% of link capacity (43.7GB/s), but lnet_selftest max out at ~14GB/s so about 28%.<br>


<br>


Any clues on what lnet tunables / settings could have any impacts here  ?<br>


<br>


Best regards,<br>


Louis<br>


<br>


On 13/08/2019 12:53, Raj wrote:<br>


</div>


<blockquote type="cite" cite="mid:CANF66k_qtwzwTKxLSqMA9HuH7kDifJidXWeFpgAi=0spO8LPcw@mail.gmail.com">


<div dir="ltr">


<div dir="ltr">Louis,


<div>I would also try:</div>


<div>- turning on selective ack (net.ipv4.tcp_sack=1) on all nodes. This helps although there is a CVE out there for older kernels.</div>


<div>- turning off checksum osc.ostid*.checksums. This can be turned off per OST/FS on clients.</div>


<div>- Increasing max_pages_per_rpc to 16M. Although this may not help with your reads.</div>


<div>- Increasing max_rpcs_in_flight and max_dirty_mb be  2 x max_rpcs_in_flight</div>


<div>- Increasing llite.ostid*.max_read_ahead_mb to up to 1024 on clients. Again this can be set per OST/FS.</div>


<div><br>


</div>


<div>_Raj</div>


</div>


</div>


<br>


<div class="gmail_quote">


<div dir="ltr" class="gmail_attr">On Mon, Aug 12, 2019 at 12:12 PM Shawn Hall <<a href="mailto:shawn.hall@nag.com" moz-do-not-send="true">shawn.hall@nag.com</a>> wrote:<br>


</div>


<blockquote class="gmail_quote" style="margin:0px 0px 0px


0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">


<div bgcolor="white" lang="EN-US">


<div class="gmail-m_-346298009108846501WordSection1">


<p class="MsoNormal"><span style="color:windowtext">Do you have Ethernet flow control configured on all ports (especially the uplink ports)?  We’ve found that flow control is critical when there are mismatched uplink/client port speeds.</span></p>


<p class="MsoNormal"><span style="color:windowtext"> </span></p>


<p class="MsoNormal"><span style="color:windowtext">Shawn</span></p>


<p class="MsoNormal"><span style="color:windowtext"> </span></p>


<div>


<div style="border-style:solid none


                  none;border-top-width:1pt;border-top-color:rgb(225,225,225);padding:3pt


                  0in 0in">


<p class="MsoNormal"><b><span style="color:windowtext">From:</span></b><span style="color:windowtext"> lustre-discuss <<a href="mailto:lustre-discuss-bounces@lists.lustre.org" target="_blank" moz-do-not-send="true">lustre-discuss-bounces@lists.lustre.org</a>>


<b>On Behalf Of </b>Louis Bailleul<br>


<b>Sent:</b> Monday, August 12, 2019 1:08 PM<br>


<b>To:</b> <a href="mailto:lustre-discuss@lists.lustre.org" target="_blank" moz-do-not-send="true">


lustre-discuss@lists.lustre.org</a><br>


<b>Subject:</b> [lustre-discuss] Very bad lnet ethernet read performance</span></p>


</div>


</div>


<p class="MsoNormal"> </p>


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">Hi all,<br>


<br>


I am trying to understand what I am doing wrong here.<br>


I have a Lustre 2.12.1 system backed by NVME drives under zfs for which obdfilter-survey gives descent values</span></p>


<blockquote style="margin-top:5pt;margin-bottom:5pt">


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">ost  2 sz 536870912K rsz 1024K obj    2 thr  256 write 15267.49 [6580.36, 8664.20] rewrite 15225.24 [6559.05, 8900.54] read 19739.86 [9062.25, 10429.04]


</span></p>


</blockquote>


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">But my actual Lustre performances are pretty poor in comparison (can't top 8GB/s write and 13.5GB/s read)<br>


So I started to question my lnet tuning but playing with peer_credits and max_rpc_per_pages didn't help.<br>


</span><br>


<span style="font-family:Helvetica,sans-serif">My test setup consist of 133x10G Ethernet clients (uplinks between end devices and OSS are 2x100G for every 20 nodes).<br>


The single OSS is fitted with a bonding of 2x100G Ethernet.<br>


<br>


I have tried to understand the problem using lnet_selftest but I'll need some help/doco as this doesn't make sense to me.<br>


<br>


Testing a single 10G client</span></p>


<blockquote style="margin-top:5pt;margin-bottom:5pt">


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">[LNet Rates of lfrom]</span><br>


<span style="font-family:Helvetica,sans-serif">[R] Avg: 2231     RPC/s Min: 2231     RPC/s Max: 2231     RPC/s</span><br>


<span style="font-family:Helvetica,sans-serif">[W] Avg: 1156     RPC/s Min: 1156     RPC/s Max: 1156     RPC/s</span><br>


<span style="font-family:Helvetica,sans-serif">[LNet Bandwidth of lfrom]</span><br>


<span style="font-family:Helvetica,sans-serif">[R] Avg: 1075.16  MiB/s Min: 1075.16  MiB/s Max: 1075.16  MiB/s


</span><br>


<span style="font-family:Helvetica,sans-serif">[W] Avg: 0.18     MiB/s Min: 0.18     MiB/s Max: 0.18     MiB/s


</span><br>


<span style="font-family:Helvetica,sans-serif">[LNet Rates of lto]</span><br>


<span style="font-family:Helvetica,sans-serif">[R] Avg: 1179     RPC/s Min: 1179     RPC/s Max: 1179     RPC/s</span><br>


<span style="font-family:Helvetica,sans-serif">[W] Avg: 2254     RPC/s Min: 2254     RPC/s Max: 2254     RPC/s</span><br>


<span style="font-family:Helvetica,sans-serif">[LNet Bandwidth of lto]</span><br>


<span style="font-family:Helvetica,sans-serif">[R] Avg: 0.19     MiB/s Min: 0.19     MiB/s Max: 0.19     MiB/s


</span><br>


<span style="font-family:Helvetica,sans-serif">[W] Avg: 1075.17  MiB/s Min: 1075.17  MiB/s Max: 1075.17  MiB/s


</span></p>


</blockquote>


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">With 10x10G clients :</span></p>


<blockquote style="margin-top:5pt;margin-bottom:5pt">


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">[LNet Rates of lfrom]<br>


[R] Avg: 1416     RPC/s Min: 1102     RPC/s Max: 1642     RPC/s<br>


[W] Avg: 708      RPC/s Min: 551      RPC/s Max: 821      RPC/s<br>


[LNet Bandwidth of lfrom]<br>


[R] Avg: 708.20   MiB/s Min: 550.77   MiB/s Max: 820.96   MiB/s <br>


[W] Avg: 0.11     MiB/s Min: 0.08     MiB/s Max: 0.13     MiB/s <br>


[LNet Rates of lto]<br>


[R] Avg: 7084     RPC/s Min: 7084     RPC/s Max: 7084     RPC/s<br>


[W] Avg: 14165    RPC/s Min: 14165    RPC/s Max: 14165    RPC/s<br>


[LNet Bandwidth of lto]<br>


[R] Avg: 1.08     MiB/s Min: 1.08     MiB/s Max: 1.08     MiB/s <br>


[W] Avg: 7081.86  MiB/s Min: 7081.86  MiB/s Max: 7081.86  MiB/s </span></p>


</blockquote>


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif"><br>


With all 133x10G clients:</span></p>


<blockquote style="margin-top:5pt;margin-bottom:5pt">


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">[LNet Rates of lfrom]<br>


[R] Avg: 510      RPC/s Min: 98       RPC/s Max: 23457    RPC/s<br>


[W] Avg: 510      RPC/s Min: 49       RPC/s Max: 45863    RPC/s<br>


[LNet Bandwidth of lfrom]<br>


[R] Avg: 169.87   MiB/s Min: 48.77    MiB/s Max: 341.26   MiB/s <br>


[W] Avg: 169.86   MiB/s Min: 0.01     MiB/s Max: 22757.92 MiB/s <br>


[LNet Rates of lto]<br>


[R] Avg: 23458    RPC/s Min: 23458    RPC/s Max: 23458    RPC/s<br>


[W] Avg: 45876    RPC/s Min: 45876    RPC/s Max: 45876    RPC/s<br>


[LNet Bandwidth of lto]<br>


[R] Avg: 341.12   MiB/s Min: 341.12   MiB/s Max: 341.12   MiB/s <br>


[W] Avg: 22758.42 MiB/s Min: 22758.42 MiB/s Max: 22758.42 MiB/s </span></p>


</blockquote>


<p class="MsoNormal" style="margin-bottom:12pt"><br>


<span style="font-family:Helvetica,sans-serif">So if I add clients the aggregate write bandwidth somewhat stacks, but the read bandwidth decrease ???<br>


When throwing all the nodes at the system, I am pretty happy with the ~22GB/s on write pretty as this is in the 90% of the 2x100G, but the 341MB/s read sounds very weird considering that this is a third of the performance of a single client.<br>


<br>


This are my ksocklnd tuning :</span></p>


<blockquote style="margin-top:5pt;margin-bottom:5pt">


<p class="MsoNormal" style="margin-bottom:12pt"><span style="font-family:Helvetica,sans-serif"># for i in /sys/module/ksocklnd/parameters/*; do echo "$i : $(cat $i)"; done<br>


/sys/module/ksocklnd/parameters/credits : 1024<br>


/sys/module/ksocklnd/parameters/eager_ack : 0<br>


/sys/module/ksocklnd/parameters/enable_csum : 0<br>


/sys/module/ksocklnd/parameters/enable_irq_affinity : 0<br>


/sys/module/ksocklnd/parameters/inject_csum_error : 0<br>


/sys/module/ksocklnd/parameters/keepalive : 30<br>


/sys/module/ksocklnd/parameters/keepalive_count : 5<br>


/sys/module/ksocklnd/parameters/keepalive_idle : 30<br>


/sys/module/ksocklnd/parameters/keepalive_intvl : 5<br>


/sys/module/ksocklnd/parameters/max_reconnectms : 60000<br>


/sys/module/ksocklnd/parameters/min_bulk : 1024<br>


/sys/module/ksocklnd/parameters/min_reconnectms : 1000<br>


/sys/module/ksocklnd/parameters/nagle : 0<br>


/sys/module/ksocklnd/parameters/nconnds : 4<br>


/sys/module/ksocklnd/parameters/nconnds_max : 64<br>


/sys/module/ksocklnd/parameters/nonblk_zcack : 1<br>


/sys/module/ksocklnd/parameters/nscheds : 12<br>


/sys/module/ksocklnd/parameters/peer_buffer_credits : 0<br>


/sys/module/ksocklnd/parameters/peer_credits : 128<br>


/sys/module/ksocklnd/parameters/peer_timeout : 180<br>


/sys/module/ksocklnd/parameters/round_robin : 1<br>


/sys/module/ksocklnd/parameters/rx_buffer_size : 0<br>


/sys/module/ksocklnd/parameters/sock_timeout : 50<br>


/sys/module/ksocklnd/parameters/tx_buffer_size : 0<br>


/sys/module/ksocklnd/parameters/typed_conns : 1<br>


/sys/module/ksocklnd/parameters/zc_min_payload : 16384<br>


/sys/module/ksocklnd/parameters/zc_recv : 0<br>


/sys/module/ksocklnd/parameters/zc_recv_min_nfrags : 16</span></p>


</blockquote>


<p class="MsoNormal"><span style="font-family:Helvetica,sans-serif">Best regards,<br>


Louis</span></p>


</div>


<br>


<br>


<p style="font-family:Verdana;font-size:10pt;color:rgb(102,102,102)"><b>Disclaimer</b></p>


<p style="font-family:Verdana;font-size:8pt;color:rgb(102,102,102)">Please see our


<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nag.co.uk_content_privacy-2Dnotice&d=DwMFaQ&c=KV_I7O14pmwRcmAVyJ1eg4Jwb8Y2JAxuL5YgMGHpjcQ&r=FTXmt89oLXmbXfP78w86-PxB1XdLYgxG8hEoAnZvCvs&m=ivu1XulCDlgfl0ZcF1MK057NBl_19awcsWYrT5l6Oc4&s=tESFOnq7ARkp3XH6U8CfNj1XnaIFjlOgyULJ0N8vyVs&e=" target="_blank" moz-do-not-send="true">


Privacy Notice</a> for information on how we process personal data.<br>


<br>


This e-mail has been scanned for all viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business.</p>


</div>


_______________________________________________<br>


lustre-discuss mailing list<br>


<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>


<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwMFaQ&c=KV_I7O14pmwRcmAVyJ1eg4Jwb8Y2JAxuL5YgMGHpjcQ&r=FTXmt89oLXmbXfP78w86-PxB1XdLYgxG8hEoAnZvCvs&m=ivu1XulCDlgfl0ZcF1MK057NBl_19awcsWYrT5l6Oc4&s=TUTMNQW1_S-T21CojVx-IpvMNY76NEsInuhtRTms770&e=" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>


</blockquote>


</div>


</blockquote>


<br>


</body>


</html>