<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Raj,</p>
    <p>Thanks for the insight.<br>
      It looks like it was the buffer size. The rx buffer was increased
      on the lustre nodes and there have been no more dropped packets.</p>
    <p>Brian Andrus<br>
    </p>
    <p><br>
    </p>
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 12/5/2017 11:12 AM, Raj wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CANF66k-G6WSMuzf3onuY9rrndx4KL4bjPquB9b0+q2zo4aJo-w@mail.gmail.com">Brian,
      <br>
      I would check the following:<br>
      - MTU size must be same across all the nodes (servers + client)<br>
      - peer_credit and credit must be same across all the nodes<br>
      - /proc/sys/lnet/peers can show if you are constantly seeing
      negative credits<br>
      - Buffer overflow counters on the switches if it provide. If the
      buffer size is low to handle IO stream, you may want to reduce
      credits. <br>
      <br>
      -Raj<br>
      <br>
      <br>
      <div class="gmail_quote">
        <div dir="ltr">On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus <<a
            href="mailto:toomuchit@gmail.com" moz-do-not-send="true">toomuchit@gmail.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div text="#000000" bgcolor="#FFFFFF">
            <p>Shawn,</p>
            <p>Flow control is configured and these connections are all
              on the same 40g subnet and all directly connected to the
              same switch.</p>
            <p>I'm a little new with using lnet_selftest, but as I run
              it 1:1, I do see the dropped packets go up on the client
              node pretty significantly when I run it. The node I set
              for server does not drop any packets.<br>
            </p>
          </div>
          <div text="#000000" bgcolor="#FFFFFF">
            <p>Brian Andrus<br>
            </p>
          </div>
          <div text="#000000" bgcolor="#FFFFFF"> <br>
            <div class="m_-2156303820911976864moz-cite-prefix">On
              12/5/2017 9:20 AM, Shawn Hall wrote:<br>
            </div>
            <blockquote type="cite"> Hi Brian,<br>
              <br>
              Do you have flow control configured on all ports that are
              on the network path? Lustre has a tendency to cause packet
              losses in ways that performance testing tools don’t
              because of the N to 1 packet flows, so flow control is
              often necessary. Lnet_selftest should replicate this
              behavior.<br>
              <br>
              Is there a point in the network path where the link
              bandwidth changes (e.g. 40 GbE down to 10 GbE, or 2x40 GbE
              down to 1x40 GbE)? That will commonly be the biggest point
              of loss if flow control isn’t doing its job.<br>
              <br>
              Shawn<br>
              <br>
              On 12/5/17, 11:49 AM, "lustre-discuss on behalf of <a
                class="m_-2156303820911976864moz-txt-link-abbreviated"
                href="mailto:jongwoohan@naver.com" target="_blank"
                moz-do-not-send="true">jongwoohan@naver.com</a>" <a
                class="m_-2156303820911976864moz-txt-link-rfc2396E"
href="mailto:lustre-discuss-bounces@lists.lustre.orgonbehalfofjongwoohan@naver.com"
                target="_blank" moz-do-not-send="true"><lustre-discuss-bounces@lists.lustre.org
                on behalf of jongwoohan@naver.com></a> wrote:<br>
              <br>
              Did you check your connection with iperf and iperf3 in TCP
              bandwidth? in that case, these tools cannot find out
              packet drops.<br>
              <br>
              Try checking out your block device backend responsibility
              with benchmark tools like vdbench or bonnie++. Sometimes
              bad block device causes incorrect data transfer.<br>
              <br>
              -----Original Message-----<br>
              From: &quot;Brian Andrus&quot;<a
                class="m_-2156303820911976864moz-txt-link-rfc2396E"
                href="mailto:toomuchit@gmail.com" target="_blank"
                moz-do-not-send="true"><toomuchit@gmail.com></a> <br>
              To:
              <a class="m_-2156303820911976864moz-txt-link-rfc2396E"
                href="mailto:lustre-discuss@lists.lustre.org"
                target="_blank" moz-do-not-send="true">"lustre-discuss@lists.lustre.org"</a><a
                class="m_-2156303820911976864moz-txt-link-rfc2396E"
                href="mailto:lustre-discuss@lists.lustre.org"
                target="_blank" moz-do-not-send="true"><lustre-discuss@lists.lustre.org></a>;
              <br>
              Cc: <br>
              Sent: 2017-12-06 (수) 01:38:04<br>
              Subject: [lustre-discuss] lustre causing dropped packets<br>
              <br>
              All,<br>
              <br>
              I have a small setup I am testing (1 MGS, 2 OSS) that is
              connected via <br>
              40G ethernet.<br>
              <br>
              I notice that when I run anything that writes to the
              lustre filesystem <br>
              causes dropped packets. Reads do not seem to cause this. I
              have also <br>
              tested the network (iperf, iperf3, general traffic) with
              no dropped packets.<br>
              <br>
              Is there something with writes that can cause dropped
              packets?<br>
              <br>
              <br>
              Brian Andrus<br>
              <br>
              _______________________________________________<br>
              lustre-discuss mailing list<br>
              <a href="mailto:lustre-discuss@lists.lustre.org"
                target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
              <a
                href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
                target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
              _______________________________________________<br>
              lustre-discuss mailing list<br>
              <a href="mailto:lustre-discuss@lists.lustre.org"
                target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
              <a
                href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
                target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
              <br>
              <br>
              <br>
              <p
                style="font-family:Verdana;font-size:10pt;color:#666666"><b>Disclaimer</b></p>
              <p style="font-family:Verdana;font-size:8pt;color:#666666">This
                e-mail has been scanned for all viruses and malware, and
                may have been automatically archived by Mimecast Ltd, an
                innovator in Software as a Service (SaaS) for business.</p>
            </blockquote>
            <br>
          </div>
          _______________________________________________<br>
          lustre-discuss mailing list<br>
          <a href="mailto:lustre-discuss@lists.lustre.org"
            target="_blank" moz-do-not-send="true">lustre-discuss@lists.lustre.org</a><br>
          <a
            href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
            rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </body>
</html>