<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Raj,</p>
<p>Thanks for the insight.<br>
It looks like it was the buffer size. The rx buffer was increased
on the lustre nodes and there have been no more dropped packets.</p>
<p>Brian Andrus<br>
</p>
<p><br>
</p>
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 12/5/2017 11:12 AM, Raj wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CANF66k-G6WSMuzf3onuY9rrndx4KL4bjPquB9b0+q2zo4aJo-w@mail.gmail.com">Brian,
<br>
I would check the following:<br>
- MTU size must be same across all the nodes (servers + client)<br>
- peer_credit and credit must be same across all the nodes<br>
- /proc/sys/lnet/peers can show if you are constantly seeing
negative credits<br>
- Buffer overflow counters on the switches if it provide. If the
buffer size is low to handle IO stream, you may want to reduce
credits. <br>
<br>
-Raj<br>
<br>
<br>
<div class="gmail_quote">
<div dir="ltr">On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus <<a
href="mailto:toomuchit@gmail.com" moz-do-not-send="true">toomuchit@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Shawn,</p>
<p>Flow control is configured and these connections are all
on the same 40g subnet and all directly connected to the
same switch.</p>
<p>I'm a little new with using lnet_selftest, but as I run
it 1:1, I do see the dropped packets go up on the client
node pretty significantly when I run it. The node I set
for server does not drop any packets.<br>
</p>
</div>
<div text="#000000" bgcolor="#FFFFFF">
<p>Brian Andrus<br>
</p>
</div>
<div text="#000000" bgcolor="#FFFFFF"> <br>
<div class="m_-2156303820911976864moz-cite-prefix">On
12/5/2017 9:20 AM, Shawn Hall wrote:<br>
</div>
<blockquote type="cite"> Hi Brian,<br>
<br>
Do you have flow control configured on all ports that are
on the network path? Lustre has a tendency to cause packet
losses in ways that performance testing tools don’t
because of the N to 1 packet flows, so flow control is
often necessary. Lnet_selftest should replicate this
behavior.<br>
<br>
Is there a point in the network path where the link
bandwidth changes (e.g. 40 GbE down to 10 GbE, or 2x40 GbE
down to 1x40 GbE)? That will commonly be the biggest point
of loss if flow control isn’t doing its job.<br>
<br>
Shawn<br>
<br>
On 12/5/17, 11:49 AM, "lustre-discuss on behalf of <a
class="m_-2156303820911976864moz-txt-link-abbreviated"
href="mailto:jongwoohan@naver.com" target="_blank"
moz-do-not-send="true">jongwoohan@naver.com</a>" <a
class="m_-2156303820911976864moz-txt-link-rfc2396E"
href="mailto:lustre-discuss-bounces@lists.lustre.orgonbehalfofjongwoohan@naver.com"
target="_blank" moz-do-not-send="true"><lustre-discuss-bounces@lists.lustre.org
on behalf of jongwoohan@naver.com></a> wrote:<br>
<br>
Did you check your connection with iperf and iperf3 in TCP
bandwidth? in that case, these tools cannot find out
packet drops.<br>
<br>
Try checking out your block device backend responsibility
with benchmark tools like vdbench or bonnie++. Sometimes
bad block device causes incorrect data transfer.<br>
<br>
-----Original Message-----<br>
From: "Brian Andrus"<a
class="m_-2156303820911976864moz-txt-link-rfc2396E"
href="mailto:toomuchit@gmail.com" target="_blank"
moz-do-not-send="true"><toomuchit@gmail.com></a> <br>
To:
<a class="m_-2156303820911976864moz-txt-link-rfc2396E"
href="mailto:lustre-discuss@lists.lustre.org"
target="_blank" moz-do-not-send="true">"lustre-discuss@lists.lustre.org"</a><a
class="m_-2156303820911976864moz-txt-link-rfc2396E"
href="mailto:lustre-discuss@lists.lustre.org"
target="_blank" moz-do-not-send="true"><lustre-discuss@lists.lustre.org></a>;
<br>
Cc: <br>
Sent: 2017-12-06 (수) 01:38:04<br>
Subject: [lustre-discuss] lustre causing dropped packets<br>
<br>
All,<br>
<br>
I have a small setup I am testing (1 MGS, 2 OSS) that is
connected via <br>
40G ethernet.<br>
<br>
I notice that when I run anything that writes to the
lustre filesystem <br>
causes dropped packets. Reads do not seem to cause this. I
have also <br>
tested the network (iperf, iperf3, general traffic) with
no dropped packets.<br>
<br>
Is there something with writes that can cause dropped
packets?<br>
<br>
<br>
Brian Andrus<br>
<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org"
target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
<a
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org"
target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
<a
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
<br>
<br>
<br>
<p
style="font-family:Verdana;font-size:10pt;color:#666666"><b>Disclaimer</b></p>
<p style="font-family:Verdana;font-size:8pt;color:#666666">This
e-mail has been scanned for all viruses and malware, and
may have been automatically archived by Mimecast Ltd, an
innovator in Software as a Service (SaaS) for business.</p>
</blockquote>
<br>
</div>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org"
target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
<a
href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>