[lustre-discuss] MDT deadlocks LU-10697

Colin Faber cfaber at gmail.com
Wed Nov 13 09:20:16 PST 2019


Which kernel are you running?

https://access.redhat.com/solutions/3393611

On Wed, Nov 13, 2019 at 4:28 AM Thomas Roth <t.roth at gsi.de> wrote:

> Hi all,
>
> we keep hitting LU-10697, which makes the users' experience quite painful.
> There was a related issue in Lustre 2.12/2.13 which is also unresolved -
> can't find the LU- at the moment.
>
> In any case, it always looks like
>
>   Nov 13 10:23:58 lxmds19.gsi.de kernel: Pid: 6449, comm: mdt00_095
> 3.10.0-957.el7_lustre.x86_64 #1
> SMP Wed Dec 12 15:03:08 UTC 2018
> Nov 13 10:23:58 lxmds19.gsi.de kernel: Call Trace:
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffff87786cf7>]
> call_rwsem_down_write_failed+0x17/0x30
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc1716f04>]
> lod_qos_prep_create+0xaa4/0x17f0 [lod]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc171818d>]
> lod_prepare_create+0x25d/0x360 [lod]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc170c9ae>]
> lod_declare_striped_create+0x1ee/0x970 [lod]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc170ee24>]
> lod_declare_create+0x1e4/0x540 [lod]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc177ab22>]
> mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc176c1a3>]
> mdd_declare_create+0x53/0xe30 [mdd]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc1770059>]
> mdd_create+0x879/0x1400 [mdd]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc166acc5>]
> mdt_reint_open+0x2175/0x3190 [mdt]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc165fb43>]
> mdt_reint_rec+0x83/0x210 [mdt]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc164137b>]
> mdt_reint_internal+0x5fb/0x9c0 [mdt]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc16418a2>]
> mdt_intent_reint+0x162/0x430 [mdt]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc164c681>]
> mdt_intent_policy+0x441/0xc70 [mdt]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc0f5d2ba>]
> ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc0f86b53>]
> ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc100c4f2>]
> tgt_enqueue+0x62/0x210 [ptlrpc]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc101042a>]
> tgt_request_handle+0x92a/0x1370 [ptlrpc]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc0fb8e5b>]
> ptlrpc_server_handle_request+0x23b/0xaa0
> [ptlrpc]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffc0fbc5a2>]
> ptlrpc_main+0xa92/0x1e40 [ptlrpc]
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffff874c1c31>]
> kthread+0xd1/0xe0
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffff87b74c37>]
> ret_from_fork_nospec_end+0x0/0x39
> Nov 13 10:23:58 lxmds19.gsi.de kernel:  [<ffffffffffffffff>]
> 0xffffffffffffffff
>
>
> and at some point the MDS gives up
>
> Nov 13 11:34:34 lxmds19.gsi.de kernel: LustreError:
> 6433:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed
> out (enqueued at 1573640974,
> 300s ago); not entering recovery in server code, just going back to sleep
> ns: mdt-hebe-MDT0000_UUID
> lock: ffff996f423ad800/0xd20e202c72a0f5f4 lrc: 3/1,0 mode: --/PR res:
> [0x20000c8f0:0x3ad6:0x0].0x0
> bits 0x13 rrc: 24 type: IBT flags: 0x40210000000000 nid: local remote: 0x0
> expref: -99 pid: 6433
> timeout: 0 lvb_type: 0
>
>
>
> Is there a chance that these issues are repaired in 2.12/2.13?
> There seems to be no activity at the moment in LU-10697, which is anyhow
> from last year.
> The jira ticket that I can't find anymore, reporting similar issues in
> 2.12, is from 2019 at least.
>
>
> Regards,
> Thomas
>
>
> --
> --------------------------------------------------------------------
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 2.291
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
>
> Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
> Managing Directors / Geschäftsführung:
> Professor Dr. Paolo Giubellino, Ursula Weyrich, Jörg Blaurock
> Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
> State Secretary / Staatssekretär Dr. Georg Schütte
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191113/7eaed9bc/attachment.html>


More information about the lustre-discuss mailing list