[lustre-discuss] frequent Connection lost, Connection restored to mdt

Degremont, Aurelien degremoa at amazon.com
Mon Dec 23 07:09:40 PST 2019


Hi

These messages means the client thinks it has lost the communication with the server and reconnect. The server only sees the reconnection and never thought the client was gone.

It could be related to lots of things. The server could be receiving RPCs from this client but not processing them fast enough. Is there other errors on your server? Is there any high load?
Same on your clients? Is there any high load that could prevent your client from communicating with your server properly?

Do you correlate that with some specific load running on your clients?

Aurélien

De : lustre-discuss <lustre-discuss-bounces at lists.lustre.org> au nom de David Cohen <cdavid at physics.technion.ac.il>
Date : dimanche 22 décembre 2019 à 17:08
À : "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Objet : [lustre-discuss] frequent Connection lost, Connection restored to mdt

Hi,
We are running 2.10.5 on the servers and 2.10.8 on the clients.
Every few minutes, we see:

On client side:

Dec 22 15:26:34 gftp kernel: Lustre: 439834:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1577021187/real 1577021187]  req at ffff88160be9c6c0 x1653620348981536/t0(0) o36->lustre-MDT0000-mdc-ffff8817d9776c00 at 10.0.0.1@tcp:12/10 lens 608/4768 e 0 to 1 dl 1577021194 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Dec 22 15:26:34 gftp kernel: Lustre: 439834:0:(client.c:2116:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT0000-mdc-ffff8817d9776c00: Connection to lustre-MDT0000 (at 10.0.0.1 at tcp) was lost; in progress operations using this service will wait for recovery to complete
Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages
Dec 22 15:26:34 gftp kernel: Lustre: lustre-MDT0000-mdc-ffff8817d9776c00: Connection restored to 10.0.0.1 at tcp (at 192.114.101.153 at tcp)
Dec 22 15:26:34 gftp kernel: Lustre: Skipped 3 previous similar messages

On server side:

Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT0000: Client 38d6eef1-e146-be41-bab9-409b272d0d4f (at 10.0.0.10 at tcp) reconnecting
Dec 22 15:26:34 oss03 kernel: Lustre: lustre-MDT0000: Connection restored to ec2cdfce-353f-583a-c970-fde3f5d5189c (at 10.0.0.10 at tcp)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191223/bcab525d/attachment.html>


More information about the lustre-discuss mailing list