<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Andreas,</p>
    <p>Thanks for the quick reply.  The client version is 2.14.0_ddn173.
       The server version is also  target_version: 2.14.0.173.  This
      originally started as the result of user input error that
      requested an OST that does not exist.  For my simple test case I
      request an OST that does not exist, and probably never will exist.
      This issue is on plieades at NAS/NASA which doesn't change very
      much.  I doubt that this related to an OST or MDT that may have
      been recently added.  <br>
    </p>
    <p>The admins are checking on LU-17334.</p>
    <p>The admins also noticed thousands of error messages<br>
    </p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[root@r593i4n16
        ~]# dmesg -T |grep LustreError</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:22 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:23 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:23 2025] LustreError: Skipped 1709 previous similar
        messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:24 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:24 2025] LustreError: Skipped 3491 previous similar
        messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:26 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:26 2025] LustreError: Skipped 7803 previous similar
        messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:30 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:30 2025] LustreError: Skipped 14891 previous
        similar messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:38 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:38 2025] LustreError: Skipped 29887 previous
        similar messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:54 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:36:54 2025] LustreError: Skipped 63032 previous
        similar messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:37:26 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:37:26 2025] LustreError: Skipped 120772 previous
        similar messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:38:30 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:38:30 2025] LustreError: Skipped 238498 previous
        similar messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:40:38 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:40:38 2025] LustreError: Skipped 515538 previous
        similar messages</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:44:54 2025] LustreError: 11-0:
        nbp17-MDT0000-mdc-ffff963283f77000: operation ldlm_enqueue to
        node
        <a href="mailto:10.151.27.142@o2ib"
          class="moz-txt-link-freetext">10.151.27.142@o2ib</a> failed:
        rc = -19</span></p>
    <p class="MsoNormal" style="text-autospace:none"><span
        style="font-size:9.0pt;font-family:"Lucida Console"">[Wed
        Apr  9 15:44:54 2025] LustreError: Skipped 1040417 previous
        similar messages</span></p>
    <span style="font-size:9.0pt;font-family:"Lucida Console"">[root@r593i4n16
      ~]#</span>
    <p></p>
    <p>John<br>
    </p>
    <div class="moz-cite-prefix">On 4/9/2025 4:58 PM, Andreas Dilger
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:EC451032-27D8-4A93-9504-BA40CF9E9CEB@ddn.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      On Apr 9, 2025, at 14:28, John Bauer via lustre-discuss
      <a class="moz-txt-link-rfc2396E" href="mailto:lustre-discuss@lists.lustre.org"><lustre-discuss@lists.lustre.org></a> wrote:
      <div>
        <blockquote type="cite">
          <div>
            <div>
              <p>I have created a small reproducer program (81 lines of
                code) that results in a process that appears to hang in
                the kernel, accumulating cpu time.  The process is
                unresponsive to kill commands.  From gdb backtrace, it
                appears the call is stuck somewhere in fsetxattr() which
                is called by llapi_layout_file_open().  The problem
                happens only when a non-existent ost is added to the
                layout with a call to llapi_layout_ost_index_set().  The
                call to llapi_layout_sanity(), just before calling
                llapi_layout_file_open(), returns 0.  Is this a known
                issue? </p>
            </div>
          </div>
        </blockquote>
      </div>
      <div>Hard to say for sure.</div>
      <div><br>
      </div>
      <div>I suspect this is related to LU-17334, which relates to
        newly-added MDTs and OSTs in the filesystem. There were a few
        patches which recently landed in 2.16.0 (and backported) that
        will sleep and retry for a short time to handle the case where a
        client accesses a file or directory layout that references an
        OST or MDT that it doesn't know about.  The assumption is that
        the OST/MDT is newly added and the configuration update hasn't
        quite made it to the client yet.  The client should retry to
        contact the new server for some time before giving up and
        returning an error (in case the layout is actually bad).</div>
      <div><br>
      </div>
      <div>Whether this is fixed in your version depends on what the
        version is (not mentioned in your email).  It may also be
        important what the server version is, which can be seen from
        "lctl get_param mdc.*.import | grep target_version", if you can
        access this parameter.  if your client & server versions
        have the LU-17734 fixes, then this would be unexpected, and if
        older versions then I'd say it is something I'd rather not
        revisit until the known fixes are in place.</div>
      <br>
      <div>
        <div dir="auto"
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
          <div dir="auto"
style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">
            <div>Cheers, Andreas</div>
            <div>—</div>
            <div>Andreas Dilger</div>
            <div>Lustre Principal Architect</div>
            <div>Whamcloud/DDN</div>
          </div>
          <br class="Apple-interchange-newline">
        </div>
        <br class="Apple-interchange-newline">
        <br class="Apple-interchange-newline">
      </div>
      <br>
    </blockquote>
  </body>
</html>