[lustre-discuss] Lustre 2.12.0 and locking problems

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Tue Mar 5 11:49:51 PST 2019


Hello,

I have quite a big issue on my Lustre 2.12.0 MDS/MDT.

Clients moving data to the OSS occur into a locking problem I never met 
before.

The clients are mostly 2.10.5 except for one which is 2.12.0 but 
regardless the client version the problem is still there.

So these are the errors I see on hte MDS/MDT. When this happens 
everything just hangs. If I reboot the MDS everything is back to 
normality but it happened already 2 times in 3 days and it is disrupting.

Any hints ?

Is it feasible to downgrade from 2.12.0 to 2.10.6 ?

thanks

Mar  5 11:10:33 psmdsana1501 kernel: Lustre: 
7898:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has 
failed due to network error: [sent 1551813033/real 1551813033] 
req at ffff9fdcbecd0300 x1626845000210688/t0(0) 
o104->ana15-MDT0000 at 172.21.52.87@o2ib:15/16 lens 296/224 e 0 to 1 dl 
1551813044 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Mar  5 11:10:33 psmdsana1501 kernel: Lustre: 
7898:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 50552576 
previous similar messages
Mar  5 11:13:03 psmdsana1501 kernel: LustreError: 
7898:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 
172.21.52.87 at o2ib) failed to reply to blocking AST (req at ffff9fdcbecd0300 
x1626845000210688 status 0 rc -110), evict it ns: mdt-ana15-MDT0000_UUID 
lock: ffff9fde9b6873c0/0x9824623d2148ef38 lrc: 4/0,0 mode: PR/PR res: 
[0x2000013a9:0x1d347:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 
0x60200400000020 nid: 172.21.52.87 at o2ib remote: 0xd8efecd6e7621e63 
expref: 8 pid: 7898 timeout: 333081 lvb_type: 0
Mar  5 11:13:03 psmdsana1501 kernel: LustreError: 138-a: ana15-MDT0000: 
A client on nid 172.21.52.87 at o2ib was evicted due to a lock blocking 
callback time out: rc -110
Mar  5 11:13:03 psmdsana1501 kernel: LustreError: 
5321:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
expired after 150s: evicting client at 172.21.52.87 at o2ib ns: 
mdt-ana15-MDT0000_UUID lock: ffff9fde9b6873c0/0x9824623d2148ef38 lrc: 
3/0,0 mode: PR/PR res: [0x2000013a9:0x1d347:0x0].0x0 bits 0x13/0x0 rrc: 
5 type: IBT flags: 0x60200400000020 nid: 172.21.52.87 at o2ib remote: 
0xd8efecd6e7621e63 expref: 9 pid: 7898 timeout: 0 lvb_type: 0
Mar  5 11:13:04 psmdsana1501 kernel: Lustre: ana15-MDT0000: Connection 
restored to 59c5a826-f4e9-0dd0-8d4f-08c204f25941 (at 172.21.52.87 at o2ib)
Mar  5 11:15:34 psmdsana1501 kernel: LustreError: 
7898:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 
172.21.52.142 at o2ib) failed to reply to blocking AST 
(req at ffff9fde2d393600 x1626845000213776 status 0 rc -110), evict it ns: 
mdt-ana15-MDT0000_UUID lock: ffff9fde9b6858c0/0x9824623d2148efee lrc: 
4/0,0 mode: PR/PR res: [0x2000013ac:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 
type: IBT flags: 0x60200400000020 nid: 172.21.52.142 at o2ib remote: 
0xbb35541ea6663082 expref: 9 pid: 7898 timeout: 333232 lvb_type: 0
Mar  5 11:15:34 psmdsana1501 kernel: LustreError: 138-a: ana15-MDT0000: 
A client on nid 172.21.52.142 at o2ib was evicted due to a lock blocking 
callback time out: rc -110
Mar  5 11:15:34 psmdsana1501 kernel: LustreError: 
5321:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
expired after 151s: evicting client at 172.21.52.142 at o2ib ns: 
mdt-ana15-MDT0000_UUID lock: ffff9fde9b6858c0/0x9824623d2148efee lrc: 
3/0,0 mode: PR/PR res: [0x2000013ac:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 
type: IBT flags: 0x60200400000020 nid: 172.21.52.142 at o2ib remote: 
0xbb35541ea6663082 expref: 10 pid: 7898 timeout: 0 lvb_type: 0
Mar  5 11:15:34 psmdsana1501 kernel: Lustre: ana15-MDT0000: Connection 
restored to 9d49a115-646b-c006-fd85-000a4b90019a (at 172.21.52.142 at o2ib)
Mar  5 11:20:33 psmdsana1501 kernel: Lustre: 
7898:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has 
failed due to network error: [sent 1551813633/real 1551813633] 
req at ffff9fdcc2a95100 x1626845000222624/t0(0) 
o104->ana15-MDT0000 at 172.21.52.87@o2ib:15/16 lens 296/224 e 0 to 1 dl 
1551813644 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1
Mar  5 11:20:33 psmdsana1501 kernel: Lustre: 
7898:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23570550 
previous similar messages
Mar  5 11:22:46 psmdsana1501 kernel: LustreError: 
7898:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 
172.21.52.87 at o2ib) failed to reply to blocking AST (req at ffff9fdcc2a95100 
x1626845000222624 status 0 rc -110), evict it ns: mdt-ana15-MDT0000_UUID 
lock: ffff9fde86ffdf80/0x9824623d2148f23a lrc: 4/0,0 mode: PR/PR res: 
[0x2000013ae:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 
0x60200400000020 nid: 172.21.52.87 at o2ib remote: 0xd8efecd6e7621eb7 
expref: 9 pid: 7898 timeout: 333665 lvb_type: 0
Mar  5 11:22:46 psmdsana1501 kernel: LustreError: 138-a: ana15-MDT0000: 
A client on nid 172.21.52.87 at o2ib was evicted due to a lock blocking 
callback time out: rc -110
Mar  5 11:22:46 psmdsana1501 kernel: LustreError: 
5321:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
expired after 150s: evicting client at 172.21.52.87 at o2ib ns: 
mdt-ana15-MDT0000_UUID lock: ffff9fde86ffdf80/0x9824623d2148f23a lrc: 
3/0,0 mode: PR/PR res: [0x2000013ae:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 
type: IBT flags: 0x60200400000020 nid: 172.21.52.87 at o2ib remote: 
0xd8efecd6e7621eb7 expref: 10 pid: 7898 timeout: 0 lvb_type: 0




More information about the lustre-discuss mailing list