[lustre-discuss] frequent Connection lost, Connection restored to mdt

David Cohen cdavid at physics.technion.ac.il
Mon Dec 23 07:46:18 PST 2019


Hi,
Yes, I do see load on the client side, but as the client has 40gb NIC and
the load comes from a 10gb WAN link I wouldn't expect it to overload the
net.
I can correlate the messages with load higher than 6gb from the WAN. Far
from the limit of the NIC.
The client has a latest generation Xeon processor so I wouldn't expect that
to be the bottle neck either.

David


On Mon, Dec 23, 2019 at 5:09 PM Degremont, Aurelien <degremoa at amazon.com>
wrote:

> Hi
>
>
>
> These messages means the client thinks it has lost the communication with
> the server and reconnect. The server only sees the reconnection and never
> thought the client was gone.
>
>
>
> It could be related to lots of things. The server could be receiving RPCs
> from this client but not processing them fast enough. Is there other errors
> on your server? Is there any high load?
>
> Same on your clients? Is there any high load that could prevent your
> client from communicating with your server properly?
>
>
>
> Do you correlate that with some specific load running on your clients?
>
>
>
> Aurélien
>
>
>
> *De : *lustre-discuss <lustre-discuss-bounces at lists.lustre.org> au nom de
> David Cohen <cdavid at physics.technion.ac.il>
> *Date : *dimanche 22 décembre 2019 à 17:08
> *À : *"lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> *Objet : *[lustre-discuss] frequent Connection lost, Connection restored
> to mdt
>
>
>
> Hi,
>
> We are running 2.10.5 on the servers and 2.10.8 on the clients.
>
> Every few minutes, we see:
>
>
>
> On client side:
>
>
>
> Dec 22 15:26:34 gftp kernel: Lustre:
> 439834:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has
> timed out for slow reply: [sent 1577021187/real 1577021187]
>  req at ffff88160be9c6c0 x1653620348981536/t0(0)
> o36->lustre-MDT0000-mdc-ffff8817d9776c00 at 10.0.0.1@tcp:12/10 lens 608/4768
> e 0 to 1 dl 1577021194 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
> Dec 22 15:26:34 gftp kernel: Lustre:
> 439834:0:(client.c:2116:ptlrpc_expire_one_request()) Skipped 3 previous
> similar messages
> Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT0000-mdc-ffff8817d9776c00:
> Connection to lustre-MDT0000 (at 10.0.0.1 at tcp) was lost; in progress
> operations using this service will wait for recovery to complete
> Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
> Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT0000-mdc-ffff8817d9776c00:
> Connection restored to 10.0.0.1 at tcp (at 192.114.101.153 at tcp)
> Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
>
>
>
> On server side:
>
>
>
> Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT0000: Client
> 38d6eef1-e146-be41-bab9-409b272d0d4f (at 10.0.0.10 at tcp) reconnecting
> Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT0000: Connection restored
> to ec2cdfce-353f-583a-c970-fde3f5d5189c (at 10.0.0.10 at tcp)
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191223/055369e0/attachment.html>


More information about the lustre-discuss mailing list