<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Ah, of course - We're only talking about restriping existing stuff.<br>
    <br>
    Yes, that's just fine - No lock conflicts on reading.  Looks good to
    me.<br>
    <br>
    This is probably also something we'd want to allow via HSM.  Not
    sure how the current patches interact with that (haven't looked).<br>
    <br>
    - Patrick<br>
    <br>
    <div class="moz-cite-prefix">On 05/19/2016 10:53 AM, Nathan Dauchy -
      NOAA Affiliate wrote:<br>
    </div>
    <blockquote
cite="mid:CAO9q2Dk3SxawxfYEU_hNcsy8uqx9UWXKQF6doH3Q3FbpSkd9aQ@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">Patrick,
        <div><br>
        </div>
        <div>You bring up an interesting point on read vs. write
          performance.  We can't use lfs_migrate control the stripe
          count used for writes (obviously), so that is left up to the
          application developer or at least the user to intelligently
          place shared access files in a directory with wider striping. 
          Restriping a file with lfs_migrate could change *read*
          performance characteristics, so there is indeed some risk
          there... but your work implies that is not too bad.  If we
          only restripe files that are "old", then the likelyhood that
          they will be read again goes way down, and balancing capacity
          used plays a bigger factor.  Bottom line is that I think
          restriping has more potential for upsides than down. :)</div>
        <div><br>
        </div>
        <div>Thanks,</div>
        <div>Nathan</div>
        <div><br>
        </div>
        <div>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Wed, May 18, 2016 at 1:22 PM,
              Patrick Farrell <span dir="ltr"><<a
                  moz-do-not-send="true" href="mailto:paf@cray.com"
                  target="_blank">paf@cray.com</a>></span> wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div bgcolor="#FFFFFF" text="#000000"> Nathan,<br>
                  <br>
                  This *is* excellent fodder for discussion.<br>
                  <br>
                  A few thoughts from a developer perspective.  When you
                  stripe a file to multiple OSTs, you're spreading the
                  data out across multiple targets, which (to my mind)
                  has two purposes:<br>
                  1) More even space usage across OSTs (mostly relevant
                  for *really* big files, since in general, singly
                  striped files are distributed across OSTs anyway)<br>
                  2) Better bandwidth/parallelism for accesses to the
                  file.<br>
                  <br>
                  The first one lends itself well to a file size based
                  heuristic, but I'm not sure the second one does. 
                  That's more about access patterns.  I'm not sure that
                  you see much bandwidth benefit from striping with a
                  single client, at least as long as an individual OST
                  is fast relative to a client (increasingly common, I
                  think, with flash and larger RAID arrays).  So then,
                  whatever the file size, if it's accessed from one
                  client, it should probably be single striped.<br>
                  <br>
                  Also, for shared files, client count relative to
                  stripe count has a huge impact on write performance. 
                  Assuming strided I/O patterns, anything more than 1
                  client per stripe/OST is actually worse than 1
                  client.  (See my lock ahead presentation at LUG'15 for
                  more on this.)  Read performance doesn't share this
                  weirdness, though.<br>
                  <br>
                  All that's to say that for case 2 above, at least for
                  writing, it's access pattern/access parallelism, not
                  size, which matters.  I'm sure there's some
                  correlation between file size and how parallel the
                  access pattern is, but it might be very loose, and at
                  least write performance doesn't scale linearly with
                  stripe size.  Instead, the behavior is complex.<br>
                  <br>
                  So in order to pick an ideal striping with case 2 in
                  mind, you really need to understand the application
                  access pattern.  I can't see another way to do that
                  goal justice.  (The Lustre ADIO in the MPI I/O library
                  does this, partly by controlling the I/O pattern
                  through I/O aggregation for collective I/Os.)<br>
                  <br>
                  So I think your tool can definitely help with case 1,
                  not so sure about case 2.<br>
                  <br>
                  - Patrick<br>
                  <br>
                  <div>On 05/18/2016 12:22 PM, Nathan Dauchy - NOAA
                    Affiliate wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_quote">
                        <div dir="ltr">
                          <div>
                            <div>Greetings All,</div>
                            <div><br>
                            </div>
                            <div>I'm looking for your experience and
                              perhaps some lively discussion regarding
                              "best practices" for choosing a file
                              stripe count.  The Lustre manual has good
                              tips on "Choosing a Stripe Size", and in
                              practice the default 1M rarely causes
                              problems on our systems. Stripe Count on
                              the other hand is far more difficult to
                              chose a single value that is efficient for
                              a general purpose and multi-use site-wide
                              file system.</div>
                            <div><br>
                            </div>
                            <div>Since there is the "increased overhead"
                              of striping, and weather applications do
                              unfortunately write MANY tiny files, we
                              usually keep the filesystem default stripe
                              count at 1.  Unfortunately, there are
                              several users who then write very large
                              and shared-access files with that
                              default.  I would like to be able to tell
                              them to restripe... but without digging
                              into the specific application and access
                              pattern it is hard to know what count to
                              recommend.  Plus there is the "stripe
                              these but not those" confusion... it is
                              common for users to have a few very large
                              data files and many small log or output
                              image files in the SAME directory.</div>
                            <div><br>
                            </div>
                            <div>What do you all recommend as a
                              reasonable rule of thumb that works for
                              "most" user's needs, where stripe count
                              can be determined based only on static
                              data attributes (such as file size)?  I
                              have heard a "stripe per GB" idea, but
                              some have said that escalates to too many
                              stripes too fast.  ORNL has a knowledge
                              base article that says use a stripe count
                              of "File size / 100 GB", but does that
                              make sense for smaller, non-DOE sites? 
                              Would stripe count = Log2(size_in_GB)+1 be
                              more generally reasonable?  For a 1 TB
                              file, that actually works out to be
                              similar to ORNL, only gets there more
                              gradually:</div>
                            <div>
                              <div>    <a moz-do-not-send="true"
                                  href="https://www.olcf.ornl.gov/kb_articles/lustre-basics/#Stripe_Count"
                                  target="_blank">https://www.olcf.ornl.gov/kb_articles/lustre-basics/#Stripe_Count</a><br>
                              </div>
                            </div>
                            <div><br>
                            </div>
                            <div>Ideally, I would like to have a tool to
                              give the users and say "go restripe your
                              directory with this command" and it will
                              do the right thing in 90% of cases.  See
                              the rough patch to lfs_migrate (included
                              below) which should help explain what I'm
                              thinking.  Probably there are more
                              efficient ways of doing things, but I have
                              tested it lightly and it works as a
                              proof-of-concept.</div>
                            <div><br>
                            </div>
                            <div>With a good programmatic rule of thumb,
                              we (as a Lustre community!) can eventually
                              work with application developers to embed
                              the stripe count selection into their code
                              and get things at least closer to right up
                              front.  Even if trial and error is
                              involved to find the optimal setting, at
                              least the rule of thumb can be a
                              _starting_point_ for the users, and they
                              can tweak it from there based on
                              application, model, scale, dataset, etc.</div>
                            <div><br>
                            </div>
                            <div>Thinking farther down the road, with
                              progressive file layout, what algorithm
                              will be used as the default?  If Lustre
                              gets to the point where it can rebalance
                              OST capacity behind the scenes, could it
                              also make some intelligent choice about
                              restriping very large files to spread out
                              load and better balance capacity?  (Would
                              that mean we need a bit set on the file to
                              flag whether the stripe info was set
                              specifically by the user or automatically
                              by Lustre tools or it was just using the
                              system default?)  Can the filesystem track
                              concurrent access to a file, and perhaps
                              migrate the file and adjust stripe count
                              based on number of active clients?</div>
                            <div><br>
                            </div>
                            <div>I appreciate any and all suggestions,
                              clarifying questions, heckles, etc.  I
                              know this is a lot of questions, and I
                              certainly don't expect definitive answers
                              on all of them, but I hope it is at least
                              food for thought and discussion! :)</div>
                            <div><br>
                            </div>
                            <div>Thanks,</div>
                            <div>Nathan</div>
                            <div><br>
                            </div>
                          </div>
                          <div><br>
                          </div>
                          <div>
                            <div>--- lfs_migrate-2.7.1<span
                                style="white-space:pre-wrap"> </span>2016-05-13

                              12:46:06.828032000 +0000</div>
                            <div>+++ lfs_migrate.auto-count<span
                                style="white-space:pre-wrap"> </span>2016-05-17

                              21:37:19.036589000 +0000</div>
                            <div>@@ -21,8 +21,10 @@</div>
                            <div> </div>
                            <div> usage() {</div>
                            <div>     cat -- <<USAGE 1>&2</div>
                            <div>-usage: lfs_migrate [-c
                              <stripe_count>] [-h] [-l] [-n] [-q]
                              [-R] [-s] [-y] [-0]</div>
                            <div>+usage: lfs_migrate [-A] [-c
                              <stripe_count>] [-h] [-l] [-n] [-q]
                              [-R] [-s] [-v] [-y] [-0]</div>
                            <div>                    [file|dir ...]</div>
                            <div>+    -A restripe file using an
                              automatically selected stripe count</div>
                            <div>+       currently Stripe Count =
                              Log2(size_in_GB)</div>
                            <div>     -c <stripe_count></div>
                            <div>        restripe file using the
                              specified stripe count</div>
                            <div>     -h show this usage message</div>
                            <div>@@ -31,11 +33,11 @@</div>
                            <div>     -q run quietly (don't print
                              filenames or status)</div>
                            <div>     -R restripe file using default
                              directory striping</div>
                            <div>     -s skip file data comparison after
                              migrate</div>
                            <div>+    -v be verbose and print
                              information about each file</div>
                            <div>     -y answer 'y' to usage question</div>
                            <div>     -0 input file names on stdin are
                              separated by a null character</div>
                            <div> </div>
                            <div>-The -c <stripe_count> option may
                              not be specified at the same time as</div>
                            <div>-the -R option.</div>
                            <div>+Only one of the '-A', '-c', or '-R'
                              options may be specified at a time.</div>
                            <div> </div>
                            <div> If a directory is an argument, all
                              files in the directory are migrated.</div>
                            <div> If no file/directory is given, the
                              file list is read from standard input.</div>
                            <div>@@ -48,15 +50,19 @@</div>
                            <div> </div>
                            <div> OPT_CHECK=y</div>
                            <div> OPT_STRIPE_COUNT=""</div>
                            <div>+OPT_AUTOSTRIPE=""</div>
                            <div>+OPT_VERBOSE=""</div>
                            <div> </div>
                            <div>-while getopts "c:hlnqRsy0" opt $*; do</div>
                            <div>+while getopts "Ac:hlnqRsvy0" opt $*;
                              do</div>
                            <div>     case $opt in</div>
                            <div>+<span style="white-space:pre-wrap"> </span>A)

                              OPT_AUTOSTRIPE=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>c)

                              OPT_STRIPE_COUNT=$OPTARG;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>l)

                              OPT_NLINK=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>n)

                              OPT_DRYRUN=n; OPT_YES=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>q)

                              ECHO=:;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>R)

                              OPT_RESTRIPE=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>s)

                              OPT_CHECK="";;</div>
                            <div>+<span style="white-space:pre-wrap"> </span>v)

                              OPT_VERBOSE=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>y)

                              OPT_YES=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>0)

                              OPT_NULL=y;;</div>
                            <div> <span style="white-space:pre-wrap"> </span>h|\?)

                              usage;;</div>
                            <div>@@ -69,6 +75,16 @@</div>
                            <div> <span style="white-space:pre-wrap"> </span>echo

                              "$(basename $0) error: The -c
                              <stripe_count> option may not"
                              1>&2</div>
                            <div> <span style="white-space:pre-wrap"> </span>echo
                              "be specified at the same time as the -R
                              option." 1>&2</div>
                            <div> <span style="white-space:pre-wrap"> </span>exit
                              1</div>
                            <div>+elif [ "$OPT_STRIPE_COUNT" -a
                              "$OPT_AUTOSTRIPE" ]; then</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo
                              ""</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo

                              "$(basename $0) error: The -c
                              <stripe_count> option may not"
                              1>&2</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo
                              "be specified at the same time as the -A
                              option." 1>&2</div>
                            <div>+<span style="white-space:pre-wrap"> </span>exit
                              1</div>
                            <div>+elif [ "$OPT_AUTOSTRIPE" -a
                              "$OPT_RESTRIPE" ]; then</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo
                              ""</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo

                              "$(basename $0) error: The -A option may
                              not be specified at" 1>&2</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo

                              "the same time as the -R option."
                              1>&2</div>
                            <div>+<span style="white-space:pre-wrap"> </span>exit
                              1</div>
                            <div> fi</div>
                            <div> </div>
                            <div> if [ -z "$OPT_YES" ]; then</div>
                            <div>@@ -107,7 +123,7 @@</div>
                            <div> <span style="white-space:pre-wrap"> </span>$ECHO
                              -n "$OLDNAME: "</div>
                            <div> </div>
                            <div> <span style="white-space:pre-wrap"> </span>#
                              avoid duplicate stat if possible</div>
                            <div>-<span style="white-space:pre-wrap"> </span>TYPE_LINK=($(LANG=C

                              stat -c "%h %F" "$OLDNAME" || true))</div>
                            <div>+<span style="white-space:pre-wrap"> </span>TYPE_LINK=($(LANG=C

                              stat -c "%h %F %s" "$OLDNAME" || true))</div>
                            <div> </div>
                            <div> <span style="white-space:pre-wrap"> </span>#
                              skip non-regular files, since they don't
                              have any objects</div>
                            <div> <span style="white-space:pre-wrap"> </span>#
                              and there is no point in trying to migrate
                              them.</div>
                            <div>@@ -127,11 +143,6 @@</div>
                            <div> <span style="white-space:pre-wrap"> </span>continue</div>
                            <div> <span style="white-space:pre-wrap"> </span>fi</div>
                            <div> </div>
                            <div>-<span style="white-space:pre-wrap"> </span>if
                              [ "$OPT_DRYRUN" ]; then</div>
                            <div>-<span style="white-space:pre-wrap"> </span>echo
                              -e "dry run, skipped"</div>
                            <div>-<span style="white-space:pre-wrap"> </span>continue</div>
                            <div>-<span style="white-space:pre-wrap"> </span>fi</div>
                            <div>-</div>
                            <div> <span style="white-space:pre-wrap"> </span>if
                              [ "$OPT_RESTRIPE" ]; then</div>
                            <div> <span style="white-space:pre-wrap"> </span>UNLINK=""</div>
                            <div> <span style="white-space:pre-wrap"> </span>else</div>
                            <div>@@ -140,16 +151,43 @@</div>
                            <div> <span style="white-space:pre-wrap"> </span>#
                              then we don't need to do this
                              getstripe/mktemp stuff.</div>
                            <div> <span style="white-space:pre-wrap"> </span>UNLINK="-u"</div>
                            <div> </div>
                            <div>-<span style="white-space:pre-wrap"> </span>[
                              "$OPT_STRIPE_COUNT" ] &&
                              COUNT=$OPT_STRIPE_COUNT ||</div>
                            <div>-<span style="white-space:pre-wrap"> </span>COUNT=$($LFS

                              getstripe -c "$OLDNAME" \</div>
                            <div>-<span style="white-space:pre-wrap"> </span>2>

                              /dev/null)</div>
                            <div> <span style="white-space:pre-wrap"> </span>SIZE=$($LFS

                              getstripe $LFS_SIZE_OPT "$OLDNAME" \</div>
                            <div> <span style="white-space:pre-wrap"> </span>
                                    2> /dev/null)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>if
                              [ "$OPT_AUTOSTRIPE" ]; then</div>
                            <div>+<span style="white-space:pre-wrap"> </span>FILE_SIZE=${TYPE_LINK[3]}<br>
                            </div>
                            <div>+<span style="white-space:pre-wrap"> </span>#
                              (math in bash is dumb, so depend on common
                              tools, and there are options for that...)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#
                              Stripe Count = Log2(size_in_GB)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#COUNT=$(echo

                              $FILE_SIZE | awk '{printf
                              "%.0f\n",log($1/1024/1024/1024)/log(2)}')</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#COUNT=$(printf

                              "%.0f\n" $(echo
                              "l($FILE_SIZE/1024/1024/1024) / l(2)" | bc
                              -l))</div>
                            <div>+<span style="white-space:pre-wrap"> </span>COUNT=$(echo

                              "l($FILE_SIZE/1024/1024/1024) / l(2) + 1"
                              | bc -l | cut -d . -f 1)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#
                              Stripe Count = size_in_GB</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#COUNT=$(echo

                              "scale=0; $FILE_SIZE/1024/1024/1024" | bc
                              -l | cut -d . -f 1)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>[
                              "$COUNT" -lt 1 ] && COUNT=1</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#
                              (does it make sense to skip the file if
                              old</div>
                            <div>+<span style="white-space:pre-wrap"> </span>#
                              and new stripe count are identical?)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>else</div>
                            <div>+<span style="white-space:pre-wrap"> </span>[
                              "$OPT_STRIPE_COUNT" ] &&
                              COUNT=$OPT_STRIPE_COUNT ||</div>
                            <div>+<span style="white-space:pre-wrap"> </span>COUNT=$($LFS

                              getstripe -c "$OLDNAME" \</div>
                            <div>+<span style="white-space:pre-wrap"> </span>2>

                              /dev/null)</div>
                            <div>+<span style="white-space:pre-wrap"> </span>fi</div>
                            <div> </div>
                            <div> <span style="white-space:pre-wrap"> </span>[
                              -z "$COUNT" -o -z "$SIZE" ] &&
                              UNLINK=""</div>
                            <div>-<span style="white-space:pre-wrap"> </span>SIZE=${LFS_SIZE_OPT}${SIZE}</div>
                            <div> <span style="white-space:pre-wrap"> </span>fi</div>
                            <div> </div>
                            <div>+<span style="white-space:pre-wrap"> </span>if
                              [ "$OPT_DRYRUN" ]; then</div>
                            <div>+<span style="white-space:pre-wrap"> </span>if
                              [ "$OPT_VERBOSE" ]; then</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo
                              -e "dry run, would use count=${COUNT}
                              size=${SIZE}"</div>
                            <div>+<span style="white-space:pre-wrap"> </span>else</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo
                              -e "dry run, skipped"</div>
                            <div>+<span style="white-space:pre-wrap"> </span>fi</div>
                            <div>+<span style="white-space:pre-wrap"> </span>continue</div>
                            <div>+<span style="white-space:pre-wrap"> </span>fi</div>
                            <div>+<span style="white-space:pre-wrap"> </span>if
                              [ "$OPT_VERBOSE" ]; then</div>
                            <div>+<span style="white-space:pre-wrap"> </span>echo
                              -n "(count=${COUNT} size=${SIZE}) "</div>
                            <div>+<span style="white-space:pre-wrap"> </span>fi</div>
                            <div>+</div>
                            <div>+<span style="white-space:pre-wrap"> </span>[
                              "$SIZE" ] &&
                              SIZE=${LFS_SIZE_OPT}${SIZE}</div>
                            <div>+</div>
                            <div> <span style="white-space:pre-wrap"> </span>#
                              first try to migrate inside lustre</div>
                            <div> <span style="white-space:pre-wrap"> </span>#
                              if failed go back to old rsync mode</div>
                            <div> <span style="white-space:pre-wrap"> </span>if
                              [[ $RSYNC_MODE == false ]]; then</div>
                          </div>
                          <div><br>
                          </div>
                        </div>
                      </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
lustre-discuss mailing list
<a moz-do-not-send="true" href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a>
<a moz-do-not-send="true" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
</pre>
                  </blockquote>
                  <br>
                </div>
                <br>
                _______________________________________________<br>
                lustre-discuss mailing list<br>
                <a moz-do-not-send="true"
                  href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
                <a moz-do-not-send="true"
                  href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org"
                  rel="noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>
                <br>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>