<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>We've been using draid in production since 2020 and I think were

      generally happy with it. We have quite a few Lustre clusters and

      on the majority of them, we run 90-drive JBODs with 1 OST/OSS

      node, 1 OST/pool and 1 pool/JBOD. We use a draid2:8d:90c:2s config

      and let the distributed spares rebuild (~2-4 hours) before

      replacing the 16TB physical disk, which then rebuilds within a day

      or so. <br>

    </p>

    <p>An important note with this configuration is that we also include

      NVMe in the pool as special allocation devices configured to store

      small blocks up to 16K. We probably have much more NVMe space than

      we need due to large NVMe drives (zpool list -v shows each mirror

      capacity is still low), but we're happy with performance.</p>

    <pre> NAME                  STATE     READ WRITE CKSUM

        asp8                  ONLINE       0     0     0

          draid2:8d:90c:2s-0  ONLINE       0     0     0

            L0                ONLINE       0     0     0

            L1                ONLINE       0     0     0

...

            L88               ONLINE       0     0     0

            L89               ONLINE       0     0     0

        special 

          mirror-1            ONLINE       0     0     0

            N6                ONLINE       0     0     0

            N7                ONLINE       0     0     0

          mirror-2            ONLINE       0     0     0

            N8                ONLINE       0     0     0

            N9                ONLINE       0     0     0

          mirror-3            ONLINE       0     0     0

            N10               ONLINE       0     0     0

            N11               ONLINE       0     0     0

        spares

          draid2-0-0          AVAIL   

          draid2-0-1          AVAIL  

</pre>

    <div class="moz-cite-prefix">On our newest systems, we have some

      106-drive JBODs with 20TB drives and in order to reduce the chance

      of multiple disk failures in a single draid device, we

      reconfigured the pools to have 2 draid devices per pool, though

      still one OST per pool and one OST per OSS. In this config we only

      have one distributed spare per draid. Due to significant write

      performance reasons we also (reluctantly) started spanning pools

      across 2 JBODs. An additional difference is we had much less NVMe

      capacity on these systems with just one small pair of NVMe drives

      per enclosure, so we configure them as special devices for pool

      metadata rather than for small block storage. The config for one

      of those pools looks like the following:</div>

    <pre class="moz-cite-prefix">       NAME                   STATE     READ WRITE CKSUM

        merced239              ONLINE       0     0     0

          draid2:11d:53c:1s-0  ONLINE       0     0     0

            L0                 ONLINE       0     0     0

            L2                 ONLINE       0     0     0

            L4                 ONLINE       0     0     0

...

            L100               ONLINE       0     0     0

            L102               ONLINE       0     0     0

            L104               ONLINE       0     0     0

          draid2:11d:53c:1s-1  ONLINE       0     0     0

            U1                 ONLINE       0     0     0

            U3                 ONLINE       0     0     0

            U5                 ONLINE       0     0     0

...

            U101               ONLINE       0     0     0

            U103               ONLINE       0     0     0

            U105               ONLINE       0     0     0

        special 

          mirror-2             ONLINE       0     0     0

            N2                 ONLINE       0     0     0

            N3                 ONLINE       0     0     0

        spares

          draid2-0-0           AVAIL   

          draid2-1-0           AVAIL   

</pre>

    <div class="moz-cite-prefix">Hope this helps,</div>

    <div class="moz-cite-prefix">Cameron</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">On 2/6/25 11:29 AM, Nehring, Shane R

      [ITS] wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:0ee568f1b7803b8779dd2f5117e2f2ab3d17561d.camel@iastate.edu">

      <pre wrap="" class="moz-quote-pre">Hello All,

I didn't want to hijack the other thread today about draid, but I have been

meaning to ask questions about it and folks' experience with it in the context

of Lustre. Most of my questions come from not having a chance to really play

around with draid much.

Have you been generally satisfied with performance of a single draid vdev vs

either multiple pools/osts per node or single osts on a pool spanning multiple

raidz(2) vdev members? Is random io comparable to a span of raidz2 vdevs? I know

one of the pain points (more from a space usage perspective as I understand it)

is the fixed stripe width and how that impacts small files, but does small file

io perform particularly badly on draid vs a span raidz2?

I've got hardware on order (a couple 60 bay jbods and heads) that's going to

replace some of the older OSTs in our current volume and I'm leaning toward a

single draid pool OST per OSS. I plan to do some benchmarking of the pools in

various configurations, but it's hard to generate a benchmark that's actually

representative of real world usage.

If you've got any insights or anecdotes regarding your experience with draid and

Lustre I'd love to hear them!

Thanks,

Shane

</pre>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre wrap="" class="moz-quote-pre">_______________________________________________

lustre-discuss mailing list

<a class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>

<a class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>

</pre>

    </blockquote>

  </body>

</html>