Brian, <br>I would check the following:<br>- MTU size must be same across all the nodes (servers + client)<br>- peer_credit and credit must be same across all the nodes<br>- /proc/sys/lnet/peers can show if you are constantly seeing negative credits<br>- Buffer overflow counters on the switches if it provide. If the buffer size is low to handle IO stream, you may want to reduce credits. <br><br>-Raj<br><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus <<a href="mailto:toomuchit@gmail.com">toomuchit@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Shawn,</p>
<p>Flow control is configured and these connections are all on the
same 40g subnet and all directly connected to the same switch.</p>
<p>I'm a little new with using lnet_selftest, but as I run it 1:1, I
do see the dropped packets go up on the client node pretty
significantly when I run it. The node I set for server does not
drop any packets.<br>
</p></div><div text="#000000" bgcolor="#FFFFFF">
<p>Brian Andrus<br>
</p></div><div text="#000000" bgcolor="#FFFFFF">
<br>
<div class="m_-2156303820911976864moz-cite-prefix">On 12/5/2017 9:20 AM, Shawn Hall wrote:<br>
</div>
<blockquote type="cite">
Hi
Brian,<br>
<br>
Do you have flow control configured on all ports that are on the
network path? Lustre has a tendency to cause packet losses in ways
that performance testing tools don’t because of the N to 1 packet
flows, so flow control is often necessary. Lnet_selftest should
replicate this behavior.<br>
<br>
Is there a point in the network path where the link bandwidth
changes (e.g. 40 GbE down to 10 GbE, or 2x40 GbE down to 1x40
GbE)? That will commonly be the biggest point of loss if flow
control isn’t doing its job.<br>
<br>
Shawn<br>
<br>
On 12/5/17, 11:49 AM, "lustre-discuss on behalf of
<a class="m_-2156303820911976864moz-txt-link-abbreviated" href="mailto:jongwoohan@naver.com" target="_blank">jongwoohan@naver.com</a>" <a class="m_-2156303820911976864moz-txt-link-rfc2396E" href="mailto:lustre-discuss-bounces@lists.lustre.orgonbehalfofjongwoohan@naver.com" target="_blank"><lustre-discuss-bounces@lists.lustre.org
on behalf of jongwoohan@naver.com></a> wrote:<br>
<br>
Did you check your connection with iperf and iperf3 in TCP
bandwidth? in that case, these tools cannot find out packet drops.<br>
<br>
Try checking out your block device backend responsibility with
benchmark tools like vdbench or bonnie++. Sometimes bad block
device causes incorrect data transfer.<br>
<br>
-----Original Message-----<br>
From: "Brian Andrus"<a class="m_-2156303820911976864moz-txt-link-rfc2396E" href="mailto:toomuchit@gmail.com" target="_blank"><toomuchit@gmail.com></a>
<br>
To:
<a class="m_-2156303820911976864moz-txt-link-rfc2396E" href="mailto:lustre-discuss@lists.lustre.org" target="_blank">"lustre-discuss@lists.lustre.org"</a><a class="m_-2156303820911976864moz-txt-link-rfc2396E" href="mailto:lustre-discuss@lists.lustre.org" target="_blank"><lustre-discuss@lists.lustre.org></a>;
<br>
Cc: <br>
Sent: 2017-12-06 (수) 01:38:04<br>
Subject: [lustre-discuss] lustre causing dropped packets<br>
<br>
All,<br>
<br>
I have a small setup I am testing (1 MGS, 2 OSS) that is connected
via <br>
40G ethernet.<br>
<br>
I notice that when I run anything that writes to the lustre
filesystem <br>
causes dropped packets. Reads do not seem to cause this. I have
also <br>
tested the network (iperf, iperf3, general traffic) with no
dropped packets.<br>
<br>
Is there something with writes that can cause dropped packets?<br>
<br>
<br>
Brian Andrus<br>
<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
<br>
<br>
<br>
<p style="font-family:Verdana;font-size:10pt;color:#666666"><b>Disclaimer</b></p>
<p style="font-family:Verdana;font-size:8pt;color:#666666">This
e-mail has been scanned for all viruses and malware, and may
have been automatically archived by Mimecast Ltd, an innovator
in Software as a Service (SaaS) for business.</p>
</blockquote>
<br>
</div>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
</blockquote></div>