[lustre-discuss] Repeatable ldlm_enqueue error

Raj Ayyampalayam ansraj at gmail.com
Wed Oct 30 13:37:18 PDT 2019


Hello,

A particular job (MPI Maker genome annotation) on our cluster produces the
following error and the job errors out with a "Could not open file error."
Server: The server is running lustre-2.10.4
Client: I've tried it with 2.10.5, 2.10.8 and 2.12.3 with the same result.
I don't see any other servers (Other MDS and OSS server nodes) reporting
communication loss to the client. The IB fabric is stable. The job runs to
completion when using a local storage on the node or a NFS mounted storage.
The job creates a lot of IO but it does not increase the load on the luster
servers.

Client:
Oct 22 14:56:39 n305 kernel: LustreError: 11-0:
lustre2-MDT0000-mdc-ffff8c3f222c4800: operation ldlm_enqueue to node
10.55.49.215 at o2ib failed: rc = -107
Oct 22 14:56:39 n305 kernel: Lustre: lustre2-MDT0000-mdc-ffff8c3f222c4800:
Connection to lustre2-MDT0000 (at 10.55.49.215 at o2ib) was lost; in progress
operations using this service will wait for recovery to complete
Oct 22 14:56:39 n305 kernel: Lustre: Skipped 2 previous similar messages
Oct 22 14:56:39 n305 kernel: LustreError: 167-0:
lustre2-MDT0000-mdc-ffff8c3f222c4800: This client was evicted by
lustre2-MDT0000; in progress operations using this service will fail.
Oct 22 14:56:39 n305 kernel: LustreError:
125851:0:(file.c:172:ll_close_inode_openhandle())
lustre2-clilmv-ffff8c3f222c4800: inode [0x20000ef38:0xffd6:0x0] mdc close
failed: rc = -108
Oct 22 14:56:39 n305 kernel: LustreError: Skipped 1 previous similar message
Oct 22 14:56:40 n305 kernel: LustreError:
125959:0:(file.c:3644:ll_inode_revalidate_fini()) lustre2: revalidate FID
[0x20000eedf:0xed9d:0x0] error: rc = -108
Oct 22 14:56:40 n305 kernel: LustreError:
125665:0:(vvp_io.c:1474:vvp_io_init()) lustre2: refresh file layout
[0x20000ef38:0xff55:0x0] error -108.
Oct 22 14:56:40 n305 kernel: LustreError:
125883:0:(ldlm_resource.c:1100:ldlm_resource_complain())
lustre2-MDT0000-mdc-ffff8c3f222c4800: namespace resource
[0x20000ef38:0xff55:0x0].0x0 (ffff8bdc6823c9c0) refcount nonzero (1) after
lock cleanup; forcing cleanup.
Oct 22 14:56:40 n305 kernel: LustreError:
125883:0:(ldlm_resource.c:1682:ldlm_resource_dump()) --- Resource:
[0x20000ef38:0xff55:0x0].0x0 (ffff8bdc6823c9c0) refcount = 1
Oct 22 14:56:40 n305 kernel: Lustre: lustre2-MDT0000-mdc-ffff8c3f222c4800:
Connection restored to 10.55.49.215 at o2ib (at 10.55.49.215 at o2ib)
Oct 22 14:56:40 n305 kernel: Lustre: Skipped 1 previous similar message
Oct 22 14:56:40 n305 kernel: LustreError:
125959:0:(file.c:3644:ll_inode_revalidate_fini()) Skipped 2 previous
similar messages

Server:
mds2-eno1: Oct 22 14:59:36 mds2 kernel: LustreError:
7182:0:(ldlm_lockd.c:697:ldlm_handle_ast_error()) ### client (nid
10.55.14.49 at o2ib) failed to reply to blocking AST (req at ffff881b0e68b900
x1635734905828112 status 0 rc -110), evict it ns: mdt-lustre2-MDT0000_UUID
lock: ffff88187ec45e00/0x121438a5db957b5 lrc: 4/0,0 mode: PR/PR res:
[0x20000ef38:0xffec:0x0].0x0 bits 0x20 rrc: 4 type: IBT flags:
0x60200400000020 nid: 10.55.14.49 at o2ib remote: 0x3154abaef2786884 expref:
72083 pid: 7182 timeout: 16143455124 lvb_type: 0
mds2-eno1: Oct 22 14:59:36 mds2 kernel: LustreError: 138-a:
lustre2-MDT0000: A client on nid 10.55.14.49 at o2ib was evicted due to a lock
blocking callback time out: rc -110
mds2-eno1: Oct 22 14:59:36 mds2 kernel: Lustre: lustre2-MDT0000: Connection
restored to 3b42ec33-0885-6b7f-6575-9b200c4b6f55 (at 10.55.14.49 at o2ib)
mds2-eno1: Oct 22 14:59:37 mds2 kernel: LustreError:
8936:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED
req at ffff881b0e68b900 x1635734905828176/t0(0)
o104->lustre2-MDT0000 at 10.55.14.49@o2ib:15/16 lens 296/224 e 0 to 0 dl 0 ref
1 fl Rpc:/0/ffffffff rc 0/-1


Can anyone point me in the right direction on how to debug this issue?

Thanks,
-Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191030/34d915ca/attachment.html>


More information about the lustre-discuss mailing list