[lustre-discuss] Filesystem hanging....

Colin Faber colin.faber at seagate.com
Thu Aug 11 08:10:29 PDT 2016


First glance indicates you're having network connectivity problems,
(possibly driver issue with your NIC?)

(Check MTU settings, etc?)

-cf


On Thu, Aug 11, 2016 at 7:43 AM, Phill Harvey-Smith <
p.harvey-smith at warwick.ac.uk> wrote:

> Hi all,
>
> I have a (fairly urgent) problem.
>
> I have been updating our cluster to Ubuntu 16.04, which has on the whole
> gone well, however in the last part I have run across a rather serious
> error.
>
> Our frontend node has an instance of samba that shares out the home
> directories, we have found that writing to a file on /home will cause the
> /home mount to timeout and become inacessable.
>
> The errors that are reported in the journal are like so :
>
> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Lustre: Build Version:
>> 2.8.53_51_g3680fa1_dirty
>> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Server MGS version (2.1.0.0)
>> is much older than client. Consider upgrading server
>> (2.8.53_51_g3680fa1_dirty)
>> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Trying to mount a client with
>> IR setting not compatible with current mgc. Force to use current mgc
>> setting that is IR disabled.
>> Aug 11 10:06:34 buster-fe0 kernel: Lustre: Mounted home-client
>> Aug 11 10:06:34 buster-fe0 mount[4687]: mount.lustre: addmntent: Invalid
>> argument:
>> Aug 11 10:06:34 buster-fe0 mount[4691]: mount.lustre: addmntent: Invalid
>> argument:
>> Aug 11 10:06:34 buster-fe0 mount[4670]: mount.lustre: addmntent: Invalid
>> argument:
>> Aug 11 10:06:35 buster-fe0 systemd[1]: Started Lustre setup.
>> Aug 11 10:06:37 buster-fe0 lustre.sh[4870]:
>> llite.home-ffff881005c37800.create_no_open_optimization=0
>> Aug 11 13:48:12 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919685/real
>> 1470919685]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919692 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
>> Aug 11 13:48:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:48:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:48:19 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919692/real
>> 1470919692]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919699 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:48:19 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:48:19 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:48:26 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919699/real
>> 1470919699]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919706 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:48:26 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:48:26 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:48:33 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919706/real
>> 1470919706]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919713 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:48:33 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:48:33 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:48:40 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919713/real
>> 1470919713]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919720 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:48:40 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:48:40 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:48:54 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919727/real
>> 1470919727]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919734 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:48:54 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 1 previous similar message
>> Aug 11 13:48:54 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:48:54 buster-fe0 kernel: Lustre: Skipped 1 previous similar
>> message
>> Aug 11 13:48:54 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:48:54 buster-fe0 kernel: Lustre: Skipped 1 previous similar
>> message
>> Aug 11 13:49:15 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919748/real
>> 1470919748]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919755 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:49:15 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 2 previous similar messages
>> Aug 11 13:49:15 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:49:15 buster-fe0 kernel: Lustre: Skipped 2 previous similar
>> messages
>> Aug 11 13:49:15 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:49:15 buster-fe0 kernel: Lustre: Skipped 2 previous similar
>> messages
>> Aug 11 13:49:50 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919783/real
>> 1470919783]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919790 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:49:50 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 4 previous similar messages
>> Aug 11 13:49:50 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:49:50 buster-fe0 kernel: Lustre: Skipped 4 previous similar
>> messages
>> Aug 11 13:49:50 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:49:50 buster-fe0 kernel: Lustre: Skipped 4 previous similar
>> messages
>> Aug 11 13:51:00 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919853/real
>> 1470919853]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919860 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:51:00 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 9 previous similar messages
>> Aug 11 13:51:00 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:51:00 buster-fe0 kernel: Lustre: Skipped 9 previous similar
>> messages
>> Aug 11 13:51:00 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:51:00 buster-fe0 kernel: Lustre: Skipped 9 previous similar
>> messages
>> Aug 11 13:53:13 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470919986/real
>> 1470919986]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470919993 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:53:13 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 18 previous similar messages
>> Aug 11 13:53:13 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:53:13 buster-fe0 kernel: Lustre: Skipped 18 previous similar
>> messages
>> Aug 11 13:53:13 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:53:13 buster-fe0 kernel: Lustre: Skipped 18 previous similar
>> messages
>> Aug 11 13:55:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 13:57:32 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470920245/real
>> 1470920245]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470920252 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 13:57:32 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 36 previous similar messages
>> Aug 11 13:57:32 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 13:57:32 buster-fe0 kernel: Lustre: Skipped 36 previous similar
>> messages
>> Aug 11 13:57:32 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 13:57:32 buster-fe0 kernel: Lustre: Skipped 36 previous similar
>> messages
>> Aug 11 13:57:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 13:57:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 13:57:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 13:59:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 13:59:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 13:59:38 buster-fe0 kernel:  [<ffffffffc0c588fc>] ?
>> ll_lookup_finish_locks+0xfc/0x8a0 [lustre]
>> Aug 11 14:06:10 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470920763/real
>> 1470920763]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470920770 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 14:06:10 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 73 previous similar messages
>> Aug 11 14:06:10 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 14:06:10 buster-fe0 kernel: Lustre: Skipped 73 previous similar
>> messages
>> Aug 11 14:06:10 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 14:06:10 buster-fe0 kernel: Lustre: Skipped 73 previous similar
>> messages
>> Aug 11 14:16:12 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470921365/real
>> 1470921365]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470921372 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 14:16:12 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 85 previous similar messages
>> Aug 11 14:16:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 14:16:12 buster-fe0 kernel: Lustre: Skipped 85 previous similar
>> messages
>> Aug 11 14:16:12 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 14:16:12 buster-fe0 kernel: Lustre: Skipped 85 previous similar
>> messages
>> Aug 11 14:26:14 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470921967/real
>> 1470921967]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470921974 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 14:26:14 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 85 previous similar messages
>> Aug 11 14:26:14 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 14:26:14 buster-fe0 kernel: Lustre: Skipped 85 previous similar
>> messages
>> Aug 11 14:26:14 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 14:26:14 buster-fe0 kernel: Lustre: Skipped 85 previous similar
>> messages
>> Aug 11 14:36:16 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> @@@ Request sent has timed out for slow reply: [sent 1470922569/real
>> 1470922569]  req at ffff8807e6076c00 x1542357171466016/t0(0)
>> o55->home-MDT0000-mdc-ffff881005c37800 at 192.168.0.4@tcp:12/10 lens
>> 592/224 e 0 to 1 dl 1470922576 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
>> Aug 11 14:36:16 buster-fe0 kernel: Lustre: 20221:0:(client.c:2067:ptlrpc_expire_one_request())
>> Skipped 85 previous similar messages
>> Aug 11 14:36:16 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection to home-MDT0000 (at 192.168.0.4 at tcp) was lost; in progress
>> operations using this service will wait for recovery to complete
>> Aug 11 14:36:16 buster-fe0 kernel: Lustre: Skipped 85 previous similar
>> messages
>> Aug 11 14:36:16 buster-fe0 kernel: Lustre: home-MDT0000-mdc-ffff881005c37800:
>> Connection restored to 192.168.0.4 at tcp (at 192.168.0.4 at tcp)
>> Aug 11 14:36:16 buster-fe0 kernel: Lustre: Skipped 85 previous similar
>> messages
>>
>
> I also can't get the recently checked out sources to compile, but will
> post a seperate query about that. :)
>
> Cheers.
>
> Phill.
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lu
> stre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=
> DQICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=x9pM59OqndbWw-lPPdr8w1Vud2
> 9EZigcxcNkz0uw5oQ&m=zW-9Djf9o-ocK161YQkqDgnP4T8BJOFtVz8rXWgh
> O_Y&s=AYaw6SUqY638craGX5JylO5KxzRFY-WlPVio8hwXpjc&e=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160811/ea55cf1c/attachment-0001.htm>


More information about the lustre-discuss mailing list