<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Le 12/05/2015 20:27, Nathan Rutman a

      écrit :<br>

    </div>

    <blockquote

cite="mid:CAB_j=MdgcH6_3Y0RopcL_YaX86iNrVjOo7Pp3dJD1kJvhVAcJQ@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">Someone sent me a link to this:

        <div><a moz-do-not-send="true"

            href="http://arxiv.org/pdf/1505.02656v1.pdf">http://arxiv.org/pdf/1505.02656v1.pdf</a></div>

        <div>Very cool. We'll need to start using that.</div>

        <div><br>

        </div>

        <div>This reminded me to send my changelog/robinhood/HSM

          concerns that I brought up at LUG to you guys for your

          thoughts.</div>

        <div><br>

        </div>

        <div>1. What should happen when the changelog on an MDS fills

          up? Maybe LCAP helps with the processing rate, but

          fundamentally the issue might still happen if nobody consumes

          due to various software or comms errors. We should either stop

          recording records and risk losing change tracking, or stop MDS

          processing. (I believe at the moment this will just crash the

          MDS.) We probably need a high water mark.</div>

        <div><br>

        </div>

        <div>2. There should be some kind of rate limiting for HSM

          requests (RH to MDS), so that the number of HSM requests

          queued up in the coordinator doesn't grow without bound. 

          Probably we need a -EAGAIN return code to RH at some point.</div>

        <div><br>

        </div>

        <div>3. It feels like there needs to be some feedback from the

          backend HSM storage to RH, in particular to pass back a

          "backend full" message. We can presumably pass a backend

          ENOSPC from the copytool back to the Coordinator, but how can

          that message get back to Robinhood? I guess coordinator could

          start returning ENOSPC for subsequent archive requests from

          RH, but then we have to clear that response if the backend

          condition clears.</div>

        <div><br clear="all">

          <div>

            <div class="gmail_signature">

              <div dir="ltr">

                <div>

                  <div dir="ltr"><b>--</b>

                    <div><font size="1"><b>Nathan Rutman · <font

                            color="#666666">Principal Systems Architect</font><br>

                          <font color="#0b5394">Seagate Technology</font></b><b> · </b>+1

                        503 877-9507<b> · </b>GMT-8</font></div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Hello Nathan,<br>

    <br>

    1: when the changelog catalog is full (4B entries IIRC) lustre

    should either automatically clear the catalog or turn the FS

    read-only (tunable, indeed). I want to propose a patch for this but

    don't have it yet.<br>

    <br>

    2: Right, there is no limitation at the moment. I think what is

    needed there is rather a high watermark on the number of pending

    requests than rate limiting. Note that on robinhood side your can

    set limitations on the number of active requests.<br>

    <br>

    3: As you say, the copytools can propagate error messages back to

    the coordinator, indicating whether they are retryable or not.

    Non-retryable errors would cause the requests to fail. Lustre can

    then either emit a changelog for failed requests (which is on the

    edge of what changelogs are for, though...) or we can add some

    mechanism into rbh to let it react when it detects that too many

    requests have failed. That said, many failed requests is something

    that probably has to be detected and handled by monitoring systems.

    Avoiding too tight coupling between HSM components is desirable.<br>

    <br>

    <br>

    Regards<br>

    <br>

    -- <br>

    Henri<br>

  </body>

</html>