[lustre-discuss] lustre causing dropped packets

Brian Andrus toomuchit at gmail.com
Tue Dec 5 12:42:13 PST 2017


Raj,

Thanks for the insight.
It looks like it was the buffer size. The rx buffer was increased on the 
lustre nodes and there have been no more dropped packets.

Brian Andrus




On 12/5/2017 11:12 AM, Raj wrote:
> Brian,
> I would check the following:
> - MTU size must be same across all the nodes (servers + client)
> - peer_credit and credit must be same across all the nodes
> - /proc/sys/lnet/peers can show if you are constantly seeing negative 
> credits
> - Buffer overflow counters on the switches if it provide. If the 
> buffer size is low to handle IO stream, you may want to reduce credits.
>
> -Raj
>
>
> On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus <toomuchit at gmail.com 
> <mailto:toomuchit at gmail.com>> wrote:
>
>     Shawn,
>
>     Flow control is configured and these connections are all on the
>     same 40g subnet and all directly connected to the same switch.
>
>     I'm a little new with using lnet_selftest, but as I run it 1:1, I
>     do see the dropped packets go up on the client node pretty
>     significantly when I run it. The node I set for server does not
>     drop any packets.
>
>     Brian Andrus
>
>
>     On 12/5/2017 9:20 AM, Shawn Hall wrote:
>>     Hi Brian,
>>
>>     Do you have flow control configured on all ports that are on the
>>     network path? Lustre has a tendency to cause packet losses in
>>     ways that performance testing tools don’t because of the N to 1
>>     packet flows, so flow control is often necessary. Lnet_selftest
>>     should replicate this behavior.
>>
>>     Is there a point in the network path where the link bandwidth
>>     changes (e.g. 40 GbE down to 10 GbE, or 2x40 GbE down to 1x40
>>     GbE)? That will commonly be the biggest point of loss if flow
>>     control isn’t doing its job.
>>
>>     Shawn
>>
>>     On 12/5/17, 11:49 AM, "lustre-discuss on behalf of
>>     jongwoohan at naver.com <mailto:jongwoohan at naver.com>"
>>     <lustre-discuss-bounces at lists.lustre.org on behalf of
>>     jongwoohan at naver.com>
>>     <mailto:lustre-discuss-bounces at lists.lustre.orgonbehalfofjongwoohan@naver.com>
>>     wrote:
>>
>>     Did you check your connection with iperf and iperf3 in TCP
>>     bandwidth? in that case, these tools cannot find out packet drops.
>>
>>     Try checking out your block device backend responsibility with
>>     benchmark tools like vdbench or bonnie++. Sometimes bad block
>>     device causes incorrect data transfer.
>>
>>     -----Original Message-----
>>     From: "Brian Andrus"<toomuchit at gmail.com>
>>     <mailto:toomuchit at gmail.com>
>>     To: "lustre-discuss at lists.lustre.org"
>>     <mailto:lustre-discuss at lists.lustre.org><lustre-discuss at lists.lustre.org>
>>     <mailto:lustre-discuss at lists.lustre.org>;
>>     Cc:
>>     Sent: 2017-12-06 (수) 01:38:04
>>     Subject: [lustre-discuss] lustre causing dropped packets
>>
>>     All,
>>
>>     I have a small setup I am testing (1 MGS, 2 OSS) that is
>>     connected via
>>     40G ethernet.
>>
>>     I notice that when I run anything that writes to the lustre
>>     filesystem
>>     causes dropped packets. Reads do not seem to cause this. I have also
>>     tested the network (iperf, iperf3, general traffic) with no
>>     dropped packets.
>>
>>     Is there something with writes that can cause dropped packets?
>>
>>
>>     Brian Andrus
>>
>>     _______________________________________________
>>     lustre-discuss mailing list
>>     lustre-discuss at lists.lustre.org
>>     <mailto:lustre-discuss at lists.lustre.org>
>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>     _______________________________________________
>>     lustre-discuss mailing list
>>     lustre-discuss at lists.lustre.org
>>     <mailto:lustre-discuss at lists.lustre.org>
>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>>
>>     *Disclaimer*
>>
>>     This e-mail has been scanned for all viruses and malware, and may
>>     have been automatically archived by Mimecast Ltd, an innovator in
>>     Software as a Service (SaaS) for business.
>>
>
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org
>     <mailto:lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171205/a9dc305f/attachment.html>


More information about the lustre-discuss mailing list