<div dir="ltr"><div dir="ltr">On Wed, Nov 13, 2019 at 4:28 AM Thomas Roth <<a href="mailto:t.roth@gsi.de">t.roth@gsi.de</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
we keep hitting LU-10697, which makes the users' experience quite painful.<br>
There was a related issue in Lustre 2.12/2.13 which is also unresolved - can't find the LU- at the moment.<br>
<br></blockquote><div><br></div><div>Thomas,</div><div><br></div><div>Perhaps you are looking for LU-12136?  Also in that ticket, LU-12018 is referenced which has a patch that _may_ reduce the likelihood of hitting the problem.</div><div><br></div><div>Regards,</div><div>Nathan</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
In any case, it always looks like<br>
<br>
  Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel: Pid: 6449, comm: mdt00_095 3.10.0-957.el7_lustre.x86_64 #1 <br>
SMP Wed Dec 12 15:03:08 UTC 2018<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel: Call Trace:<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffff87786cf7>] call_rwsem_down_write_failed+0x17/0x30<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc1716f04>] lod_qos_prep_create+0xaa4/0x17f0 [lod]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc171818d>] lod_prepare_create+0x25d/0x360 [lod]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc170c9ae>] lod_declare_striped_create+0x1ee/0x970 [lod]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc170ee24>] lod_declare_create+0x1e4/0x540 [lod]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc177ab22>] <br>
mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc176c1a3>] mdd_declare_create+0x53/0xe30 [mdd]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc1770059>] mdd_create+0x879/0x1400 [mdd]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc166acc5>] mdt_reint_open+0x2175/0x3190 [mdt]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc165fb43>] mdt_reint_rec+0x83/0x210 [mdt]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc164137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc16418a2>] mdt_intent_reint+0x162/0x430 [mdt]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc164c681>] mdt_intent_policy+0x441/0xc70 [mdt]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc0f5d2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc0f86b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc100c4f2>] tgt_enqueue+0x62/0x210 [ptlrpc]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc101042a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc0fb8e5b>] ptlrpc_server_handle_request+0x23b/0xaa0 <br>
[ptlrpc]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffc0fbc5a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffff874c1c31>] kthread+0xd1/0xe0<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffff87b74c37>] ret_from_fork_nospec_end+0x0/0x39<br>
Nov 13 10:23:58 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel:  [<ffffffffffffffff>] 0xffffffffffffffff<br>
<br>
<br>
and at some point the MDS gives up<br>
<br>
Nov 13 11:34:34 <a href="http://lxmds19.gsi.de" rel="noreferrer" target="_blank">lxmds19.gsi.de</a> kernel: LustreError: <br>
6433:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573640974, <br>
300s ago); not entering recovery in server code, just going back to sleep ns: mdt-hebe-MDT0000_UUID <br>
lock: ffff996f423ad800/0xd20e202c72a0f5f4 lrc: 3/1,0 mode: --/PR res: [0x20000c8f0:0x3ad6:0x0].0x0 <br>
bits 0x13 rrc: 24 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 6433 <br>
timeout: 0 lvb_type: 0<br>
<br>
<br>
<br>
Is there a chance that these issues are repaired in 2.12/2.13?<br>
There seems to be no activity at the moment in LU-10697, which is anyhow from last year.<br>
The jira ticket that I can't find anymore, reporting similar issues in 2.12, is from 2019 at least.<br>
<br>
<br>
Regards,<br>
Thomas<br>
<br>
<br>
-- <br>
--------------------------------------------------------------------<br>
Thomas Roth<br>
Department: Informationstechnologie<br>
Location: SB3 2.291<br>
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986<br>
<br>
<br>
GSI Helmholtzzentrum für Schwerionenforschung GmbH<br>
Planckstraße 1, 64291 Darmstadt, Germany, <a href="http://www.gsi.de" rel="noreferrer" target="_blank">www.gsi.de</a><br>
<br>
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528<br>
Managing Directors / Geschäftsführung:<br>
Professor Dr. Paolo Giubellino, Ursula Weyrich, Jörg Blaurock<br>
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:<br>
State Secretary / Staatssekretär Dr. Georg Schütte<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
</blockquote></div></div>