<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 14/10/16 14:38, Dilger, Andreas
      wrote:<br>
    </div>
    <blockquote
      cite="mid:28E8A029-096C-48ED-862E-EBC84702B5E1@intel.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <meta name="Title" content="">
      <meta name="Keywords" content="">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
span.apple-style-span
        {mso-style-name:apple-style-span;}
p.emailquote, li.emailquote, div.emailquote
        {mso-style-name:emailquote;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:1.0pt;
        margin-bottom:.0001pt;
        border:none;
        padding:0in;
        font-size:12.0pt;
        font-family:"Times New Roman";}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:Calibri;
        color:windowtext;}
span.msoIns
        {mso-style-type:export-only;
        mso-style-name:"";
        text-decoration:underline;
        color:teal;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style>
      <div class="WordSection1">
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri">John, with
            newer Lustre clients it is possible for multiple threads to
            submit non-overlapping writes concurrently (also not
            conflicting within a single page), see LU-1669 for details.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri">Even so,
            O_DIRECT writes need to be synchronous to disk on the OSS,
            as Patrick reports, because if the OSS fails before the
            write is on disk there is no cached copy of the data on the
            client that can be used to resend the RPC.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri">The problem is
            that the ZFS OSD has very long transaction commit times for
            synchronous writes because it does not yet have support for
            the ZIL.  Using buffered writes, or having very large
            O_DIRECT writes (e.g. 40MB or larger) and large RPCs (4MB,
            or up to 16MB in 2.9.0) to amortize the sync overhead may be
            beneficial if you really want to use O_DIRECT.
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri">Riccardo,<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri">The other
            potential issue is that you have 20 OSTs on a single OSS,
            which isn't going to have very good performance.  Spreading
            the OSTs across multiple OSS nodes is going to improve your
            performance significantly when there are multiple clients
            writing, as there will be N times the OSS network bandwidth,
            N times the CPU, N times the RAM.  It only makes sense to
            have 20 OSTs/OSS if your workload is only a single client
            and you want the maximum possible capacity for a given cost.</span></p>
      </div>
    </blockquote>
    <br>
    Hello Andreas,<br>
    each OST has a separate VDEV and separate zpool.<br>
    thank you<br>
    <br>
    <blockquote
      cite="mid:28E8A029-096C-48ED-862E-EBC84702B5E1@intel.com"
      type="cite">
      <div class="WordSection1">
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri">Is each OST a
            separate VDEV and separate zpool, or are they a single
            zpool?  Separate zpools have less overhead for maximum
            performance, but only one VDEV per zpool means that metadata
            ditto blocks are written twice per RAID-Z2 VDEV, which isn't
            very efficient.  Having at least 3 VDEVs per zpool is better
            in this regard.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
        <div>
          <div>
            <div>
              <p class="MsoNormal"><span
                  style="font-size:10.5pt;font-family:Calibri;color:black">Cheers,
                  Andreas<o:p></o:p></span></p>
            </div>
            <div>
              <p class="MsoNormal"><span
                  style="font-size:10.5pt;font-family:Calibri;color:black">-- <o:p></o:p></span></p>
            </div>
            <div>
              <p class="MsoNormal"><span
                  style="font-size:10.5pt;font-family:Calibri;color:black">Andreas
                  Dilger<o:p></o:p></span></p>
            </div>
          </div>
          <div>
            <p class="MsoNormal"><span
                style="font-size:10.5pt;font-family:Calibri;color:black">Lustre
                Principal Architect<o:p></o:p></span></p>
          </div>
        </div>
        <p class="MsoNormal"><span
            style="font-size:10.5pt;font-family:Calibri;color:black">Intel
            High Performance Data Division</span><span
            style="font-size:11.0pt;font-family:Calibri"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
        <div>
          <div>
            <div>
              <p class="MsoNormal" style="margin-left:.5in"><span
                  style="font-size:11.0pt;font-family:Consolas;color:black">On
                  2016/10/14, 15:22, "John Bauer" <<a
                    moz-do-not-send="true"
                    href="mailto:bauerj@iodoctors.com">bauerj@iodoctors.com</a>>
                  wrote:<o:p></o:p></span></p>
            </div>
          </div>
        </div>
        <div>
          <p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
        </div>
        <div>
          <p class="MsoNormal" style="margin-left:.5in">Patrick<o:p></o:p></p>
        </div>
        <div id="AppleMailSignature">
          <p class="MsoNormal" style="margin-left:.5in">I thought at one
            time there was an inode lock held for the duration of the
            direct I/O read or write. So that even if one had multiple
            application threads writing direct, only one was "in flight"
            at a time. Has that changed?<o:p></o:p></p>
        </div>
        <div id="AppleMailSignature">
          <p class="MsoNormal" style="margin-left:.5in">John<br>
            <br>
            Sent from my iPhone<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal"
style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in"><br>
            On Oct 14, 2016, at 3:16 PM, Patrick Farrell <<a
              moz-do-not-send="true" href="mailto:paf@cray.com">paf@cray.com</a>>
            wrote:<o:p></o:p></p>
        </div>
        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
          <div>
            <div id="divtagdefaultwrapper">
              <p style="margin-left:.5in"><span
                  style="font-family:Calibri;color:black">Sorry, I
                  phrased one thing wrong:<br>
                  I said "transferring to the network", but it's
                  actually until it's received confirmation the data has
                  been received successfully, I believe.<o:p></o:p></span></p>
              <p style="margin-left:.5in"><span
                  style="font-family:Calibri;color:black"><o:p> </o:p></span></p>
              <p style="margin-left:.5in"><span
                  style="font-family:Calibri;color:black">In any case,
                  only one I/O (per thread) can be outstanding at a time
                  with direct I/O.<o:p></o:p></span></p>
            </div>
            <div class="MsoNormal"
              style="margin-left:.5in;text-align:center" align="center">
              <hr align="center" size="2" width="98%">
            </div>
            <div id="divRplyFwdMsg">
              <p class="MsoNormal" style="margin-left:.5in"><b><span
                    style="font-size:11.0pt;font-family:Calibri;color:black">From:</span></b><span
style="font-size:11.0pt;font-family:Calibri;color:black"> lustre-discuss
                  <<a moz-do-not-send="true"
                    href="mailto:lustre-discuss-bounces@lists.lustre.org">lustre-discuss-bounces@lists.lustre.org</a>>
                  on behalf of Patrick Farrell <<a
                    moz-do-not-send="true" href="mailto:paf@cray.com">paf@cray.com</a>><br>
                  <b>Sent:</b> Friday, October 14, 2016 3:12:22 PM<br>
                  <b>To:</b> Riccardo Veraldi; <a
                    moz-do-not-send="true"
                    href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
                  <b>Subject:</b> Re: [lustre-discuss] Lustre on ZFS
                  pooer direct I/O performance</span>
                <o:p></o:p></p>
              <div>
                <p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
              </div>
            </div>
            <div>
              <div>
                <div id="x_divtagdefaultwrapper">
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black">Riccardo,<o:p></o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black"><o:p> </o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black">While the
                      difference is extreme, direct I/O write
                      performance will always be poor.  Direct I/O
                      writes cannot be asynchronous, since they don't
                      use the page cache.  This means Lustre cannot
                      return from one write (and start the next) until
                      it has finished transferring the data to the
                      network.<o:p></o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black"><o:p> </o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black">This means
                      you can only have one I/O in flight at a time. 
                      Good write performance from Lustre (or any network
                      filesystem) depends on keeping a lot of data in
                      flight at once.<o:p></o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black"><o:p> </o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black">What sort
                      of direct write performance were you hoping for? 
                      It will never match that 800 MB/s from one thread
                      you see with buffered I/O.<o:p></o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black"><o:p> </o:p></span></p>
                  <p style="margin-left:.5in"><span
                      style="font-family:Calibri;color:black">- Patrick<o:p></o:p></span></p>
                </div>
                <div class="MsoNormal"
                  style="margin-left:.5in;text-align:center"
                  align="center">
                  <hr align="center" size="2" width="98%">
                </div>
                <div id="x_divRplyFwdMsg">
                  <p class="MsoNormal" style="margin-left:.5in"><b><span
style="font-size:11.0pt;font-family:Calibri;color:black">From:</span></b><span
style="font-size:11.0pt;font-family:Calibri;color:black"> lustre-discuss
                      <<a moz-do-not-send="true"
                        href="mailto:lustre-discuss-bounces@lists.lustre.org">lustre-discuss-bounces@lists.lustre.org</a>>
                      on behalf of Riccardo Veraldi <<a
                        moz-do-not-send="true"
                        href="mailto:Riccardo.Veraldi@cnaf.infn.it">Riccardo.Veraldi@cnaf.infn.it</a>><br>
                      <b>Sent:</b> Friday, October 14, 2016 2:22:32 PM<br>
                      <b>To:</b> <a moz-do-not-send="true"
                        href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
                      <b>Subject:</b> [lustre-discuss] Lustre on ZFS
                      pooer direct I/O performance</span>
                    <o:p></o:p></p>
                  <div>
                    <p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
                  </div>
                </div>
              </div>
              <div>
                <p class="MsoNormal" style="margin-left:.5in"><span
                    style="font-size:10.0pt">Hello,<br>
                    <br>
                    I would like how may I improve the situation of my
                    lustre cluster.<br>
                    <br>
                    I have 1 MDS and 1 OSS with 20 OST defined.<br>
                    <br>
                    Each OST is a 8x Disks RAIDZ2.<br>
                    <br>
                    A single process write performance is around
                    800MB/sec<br>
                    <br>
                    anyway if I force direct I/O, for example using
                    oflag=direct in dd, the <br>
                    write performance drop as low as 8MB/sec<br>
                    <br>
                    with 1MB block size. And each write it's about 120ms
                    latency.<br>
                    <br>
                    I used these ZFS settings<br>
                    <br>
                    options zfs zfs_prefetch_disable=1<br>
                    options zfs zfs_txg_history=120<br>
                    options zfs metaslab_debug_unload=1<br>
                    <br>
                    i am quite worried for the low performance.<br>
                    <br>
                    Any hints or suggestions that may help me to improve
                    the situation ?<br>
                    <br>
                    <br>
                    thank you<br>
                    <br>
                    <br>
                    Rick<br>
                    <br>
                    <br>
                    _______________________________________________<br>
                    lustre-discuss mailing list<br>
                    <a moz-do-not-send="true"
                      href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
                    <a moz-do-not-send="true"
                      href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><o:p></o:p></span></p>
              </div>
            </div>
          </div>
        </blockquote>
        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
          <div>
            <p class="MsoNormal"
style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in">_______________________________________________<br>
              lustre-discuss mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a><br>
              <a moz-do-not-send="true"
                href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><o:p></o:p></p>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>