[lustre-discuss] MDS often overload

Zeeshan Ali Shah javaclinic at gmail.com
Thu Oct 11 00:57:29 PDT 2018


I have fixed the issue

some how misconf two systems had the same IP of client .. after it change
all are ok

/Zee

On Mon, Oct 8, 2018 at 12:51 PM Zeeshan Ali Shah <javaclinic at gmail.com>
wrote:

> We are getting the following error when run rsync .
>
> Background: We have three filesystem on same MDS , MDT are different zfs
> pools.. would that be an issue ?
>
> any advice ?
>
> error below
> --------
> [Mon Oct  8 12:29:11 2018] Lustre: sgp-MDT0000-mdc-ffff883ffc2c3000:
> Connection restored to 172.100.120.25 at o2ib (at 172.100.120.25 at o2ib)
> [Mon Oct  8 12:29:11 2018] Lustre: Skipped 14 previous similar messages
> [Mon Oct  8 12:31:25 2018] LNet:
> 31340:0:(o2iblnd_cb.c:1350:kiblnd_reconnect_peer()) Abort reconnection of
> 172.100.120.25 at o2ib: connected
> [Mon Oct  8 12:31:25 2018] LNet:
> 31340:0:(o2iblnd_cb.c:1350:kiblnd_reconnect_peer()) Skipped 9 previous
> similar messages
> [Mon Oct  8 12:31:32 2018] Lustre:
> 54961:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has
> timed out for slow reply: [sent 1538991905/real 1538991905]
> req at ffff8819acf68f00 x1611847503061104/t0(0)
> o36->sgp-MDT0000-mdc-ffff883ffc2c3000 at 172.100.120.25@o2ib:12/10 lens
> 608/33520 e 0 to 1 dl 1538991912 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
> [Mon Oct  8 12:31:32 2018] Lustre:
> 54961:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 2 previous
> similar messages
> [Mon Oct  8 12:31:32 2018] Lustre: sgp-MDT0000-mdc-ffff883ffc2c3000:
> Connection to sgp-MDT0000 (at 172.100.120.25 at o2ib) was lost; in progress
> operations using this service will wait for recovery to complete
> [Mon Oct  8 12:31:32 2018] Lustre: Skipped 2 previous similar messages
> [Mon Oct  8 12:31:32 2018] Lustre: sgp-MDT0000-mdc-ffff883ffc2c3000:
> Connection restored to 172.100.120.25 at o2ib (at 172.100.120.25 at o2ib)
> [Mon Oct  8 12:31:32 2018] Lustre: Skipped 2 previous similar messages
> [Mon Oct  8 12:34:01 2018] LNet:
> 25934:0:(o2iblnd_cb.c:2307:kiblnd_passive_connect()) Stale connection
> request
> [Mon Oct  8 12:34:01 2018] LNet:
> 25934:0:(o2iblnd_cb.c:2307:kiblnd_passive_connect()) Skipped 2 previous
> similar messages
> [Mon Oct  8 12:34:01 2018] LNet:
> 31340:0:(o2iblnd_cb.c:1350:kiblnd_reconnect_peer()) Abort reconnection of
> 172.100.120.25 at o2ib: connected
> [Mon Oct  8 12:34:01 2018] LNet:
> 31340:0:(o2iblnd_cb.c:1350:kiblnd_reconnect_peer()) Skipped 3 previous
> similar messages
> [Mon Oct  8 12:34:08 2018] Lustre:
> 54961:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has
> timed out for slow reply: [sent 1538992061/real 1538992061]
> req at ffff881ad060f500 x1611847503440304/t0(0)
> o101->sgp-MDT0000-mdc-ffff883ffc2c3000 at 172.100.120.25@o2ib:12/10 lens
> 880/33728 e 0 to 1 dl 1538992068 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
> [Mon Oct  8 12:34:08 2018] Lustre:
> 54961:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous
> similar message
> [Mon Oct  8 12:34:08 2018] Lustre: sgp-MDT0000-mdc-ffff883ffc2c3000:
> Connection to sgp-MDT0000 (at 172.100.120.25 at o2ib) was lost; in progress
> operations using this service will wait for recovery to complete
> [Mon Oct  8 12:34:08 2018] Lustre: Skipped 1 previous similar message
> [Mon Oct  8 12:34:08 2018] Lustre: sgp-MDT0000-mdc-ffff883ffc2c3000:
> Connection restored to 172.100.120.25 at o2ib (at 172.100.120.25 at o2ib)
> [Mon Oct  8 12:34:08 2018] Lustre: Skipped 1 previous similar message
> ------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181011/df58edfd/attachment.html>


More information about the lustre-discuss mailing list