[lustre-discuss] Lustre 2.12.0 and locking problems

Patrick Farrell pfarrell at whamcloud.com
Tue Mar 5 12:14:05 PST 2019


Riccardo,

Since 2.12 is still a relatively new maintenance release, it would be helpful if you could open an LU and provide more detail there - Such as what clients were doing, if you were using any new features (like DoM or FLR), and full dmesg from the clients and servers involved in these evictions.

- Patrick

On 3/5/19, 11:50 AM, "lustre-discuss on behalf of Riccardo Veraldi" <lustre-discuss-bounces at lists.lustre.org on behalf of Riccardo.Veraldi at cnaf.infn.it> wrote:

    Hello,
    
    I have quite a big issue on my Lustre 2.12.0 MDS/MDT.
    
    Clients moving data to the OSS occur into a locking problem I never met 
    before.
    
    The clients are mostly 2.10.5 except for one which is 2.12.0 but 
    regardless the client version the problem is still there.
    
    So these are the errors I see on hte MDS/MDT. When this happens 
    everything just hangs. If I reboot the MDS everything is back to 
    normality but it happened already 2 times in 3 days and it is disrupting.
    
    Any hints ?
    
    Is it feasible to downgrade from 2.12.0 to 2.10.6 ?
    
    thanks
    
    Mar  5 11:10:33 psmdsana1501 kernel: Lustre: 
    7898:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has 
    failed due to network error: [sent 1551813033/real 1551813033] 
    req at ffff9fdcbecd0300 x1626845000210688/t0(0) 
    o104->ana15-MDT0000 at 172.21.52.87@o2ib:15/16 lens 296/224 e 0 to 1 dl 
    1551813044 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
    Mar  5 11:10:33 psmdsana1501 kernel: Lustre: 
    7898:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 50552576 
    previous similar messages
    Mar  5 11:13:03 psmdsana1501 kernel: LustreError: 
    7898:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 
    172.21.52.87 at o2ib) failed to reply to blocking AST (req at ffff9fdcbecd0300 
    x1626845000210688 status 0 rc -110), evict it ns: mdt-ana15-MDT0000_UUID 
    lock: ffff9fde9b6873c0/0x9824623d2148ef38 lrc: 4/0,0 mode: PR/PR res: 
    [0x2000013a9:0x1d347:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 
    0x60200400000020 nid: 172.21.52.87 at o2ib remote: 0xd8efecd6e7621e63 
    expref: 8 pid: 7898 timeout: 333081 lvb_type: 0
    Mar  5 11:13:03 psmdsana1501 kernel: LustreError: 138-a: ana15-MDT0000: 
    A client on nid 172.21.52.87 at o2ib was evicted due to a lock blocking 
    callback time out: rc -110
    Mar  5 11:13:03 psmdsana1501 kernel: LustreError: 
    5321:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
    expired after 150s: evicting client at 172.21.52.87 at o2ib ns: 
    mdt-ana15-MDT0000_UUID lock: ffff9fde9b6873c0/0x9824623d2148ef38 lrc: 
    3/0,0 mode: PR/PR res: [0x2000013a9:0x1d347:0x0].0x0 bits 0x13/0x0 rrc: 
    5 type: IBT flags: 0x60200400000020 nid: 172.21.52.87 at o2ib remote: 
    0xd8efecd6e7621e63 expref: 9 pid: 7898 timeout: 0 lvb_type: 0
    Mar  5 11:13:04 psmdsana1501 kernel: Lustre: ana15-MDT0000: Connection 
    restored to 59c5a826-f4e9-0dd0-8d4f-08c204f25941 (at 172.21.52.87 at o2ib)
    Mar  5 11:15:34 psmdsana1501 kernel: LustreError: 
    7898:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 
    172.21.52.142 at o2ib) failed to reply to blocking AST 
    (req at ffff9fde2d393600 x1626845000213776 status 0 rc -110), evict it ns: 
    mdt-ana15-MDT0000_UUID lock: ffff9fde9b6858c0/0x9824623d2148efee lrc: 
    4/0,0 mode: PR/PR res: [0x2000013ac:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 
    type: IBT flags: 0x60200400000020 nid: 172.21.52.142 at o2ib remote: 
    0xbb35541ea6663082 expref: 9 pid: 7898 timeout: 333232 lvb_type: 0
    Mar  5 11:15:34 psmdsana1501 kernel: LustreError: 138-a: ana15-MDT0000: 
    A client on nid 172.21.52.142 at o2ib was evicted due to a lock blocking 
    callback time out: rc -110
    Mar  5 11:15:34 psmdsana1501 kernel: LustreError: 
    5321:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
    expired after 151s: evicting client at 172.21.52.142 at o2ib ns: 
    mdt-ana15-MDT0000_UUID lock: ffff9fde9b6858c0/0x9824623d2148efee lrc: 
    3/0,0 mode: PR/PR res: [0x2000013ac:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 
    type: IBT flags: 0x60200400000020 nid: 172.21.52.142 at o2ib remote: 
    0xbb35541ea6663082 expref: 10 pid: 7898 timeout: 0 lvb_type: 0
    Mar  5 11:15:34 psmdsana1501 kernel: Lustre: ana15-MDT0000: Connection 
    restored to 9d49a115-646b-c006-fd85-000a4b90019a (at 172.21.52.142 at o2ib)
    Mar  5 11:20:33 psmdsana1501 kernel: Lustre: 
    7898:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has 
    failed due to network error: [sent 1551813633/real 1551813633] 
    req at ffff9fdcc2a95100 x1626845000222624/t0(0) 
    o104->ana15-MDT0000 at 172.21.52.87@o2ib:15/16 lens 296/224 e 0 to 1 dl 
    1551813644 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1
    Mar  5 11:20:33 psmdsana1501 kernel: Lustre: 
    7898:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23570550 
    previous similar messages
    Mar  5 11:22:46 psmdsana1501 kernel: LustreError: 
    7898:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 
    172.21.52.87 at o2ib) failed to reply to blocking AST (req at ffff9fdcc2a95100 
    x1626845000222624 status 0 rc -110), evict it ns: mdt-ana15-MDT0000_UUID 
    lock: ffff9fde86ffdf80/0x9824623d2148f23a lrc: 4/0,0 mode: PR/PR res: 
    [0x2000013ae:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 
    0x60200400000020 nid: 172.21.52.87 at o2ib remote: 0xd8efecd6e7621eb7 
    expref: 9 pid: 7898 timeout: 333665 lvb_type: 0
    Mar  5 11:22:46 psmdsana1501 kernel: LustreError: 138-a: ana15-MDT0000: 
    A client on nid 172.21.52.87 at o2ib was evicted due to a lock blocking 
    callback time out: rc -110
    Mar  5 11:22:46 psmdsana1501 kernel: LustreError: 
    5321:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
    expired after 150s: evicting client at 172.21.52.87 at o2ib ns: 
    mdt-ana15-MDT0000_UUID lock: ffff9fde86ffdf80/0x9824623d2148f23a lrc: 
    3/0,0 mode: PR/PR res: [0x2000013ae:0x1:0x0].0x0 bits 0x13/0x0 rrc: 3 
    type: IBT flags: 0x60200400000020 nid: 172.21.52.87 at o2ib remote: 
    0xd8efecd6e7621eb7 expref: 10 pid: 7898 timeout: 0 lvb_type: 0
    
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    



More information about the lustre-discuss mailing list