<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>We've been using draid in production since 2020 and I think were
generally happy with it. We have quite a few Lustre clusters and
on the majority of them, we run 90-drive JBODs with 1 OST/OSS
node, 1 OST/pool and 1 pool/JBOD. We use a draid2:8d:90c:2s config
and let the distributed spares rebuild (~2-4 hours) before
replacing the 16TB physical disk, which then rebuilds within a day
or so. <br>
</p>
<p>An important note with this configuration is that we also include
NVMe in the pool as special allocation devices configured to store
small blocks up to 16K. We probably have much more NVMe space than
we need due to large NVMe drives (zpool list -v shows each mirror
capacity is still low), but we're happy with performance.</p>
<pre> NAME STATE READ WRITE CKSUM
asp8 ONLINE 0 0 0
draid2:8d:90c:2s-0 ONLINE 0 0 0
L0 ONLINE 0 0 0
L1 ONLINE 0 0 0
...
L88 ONLINE 0 0 0
L89 ONLINE 0 0 0
special
mirror-1 ONLINE 0 0 0
N6 ONLINE 0 0 0
N7 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
N8 ONLINE 0 0 0
N9 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
N10 ONLINE 0 0 0
N11 ONLINE 0 0 0
spares
draid2-0-0 AVAIL
draid2-0-1 AVAIL
</pre>
<div class="moz-cite-prefix">On our newest systems, we have some
106-drive JBODs with 20TB drives and in order to reduce the chance
of multiple disk failures in a single draid device, we
reconfigured the pools to have 2 draid devices per pool, though
still one OST per pool and one OST per OSS. In this config we only
have one distributed spare per draid. Due to significant write
performance reasons we also (reluctantly) started spanning pools
across 2 JBODs. An additional difference is we had much less NVMe
capacity on these systems with just one small pair of NVMe drives
per enclosure, so we configure them as special devices for pool
metadata rather than for small block storage. The config for one
of those pools looks like the following:</div>
<pre class="moz-cite-prefix"> NAME STATE READ WRITE CKSUM
merced239 ONLINE 0 0 0
draid2:11d:53c:1s-0 ONLINE 0 0 0
L0 ONLINE 0 0 0
L2 ONLINE 0 0 0
L4 ONLINE 0 0 0
...
L100 ONLINE 0 0 0
L102 ONLINE 0 0 0
L104 ONLINE 0 0 0
draid2:11d:53c:1s-1 ONLINE 0 0 0
U1 ONLINE 0 0 0
U3 ONLINE 0 0 0
U5 ONLINE 0 0 0
...
U101 ONLINE 0 0 0
U103 ONLINE 0 0 0
U105 ONLINE 0 0 0
special
mirror-2 ONLINE 0 0 0
N2 ONLINE 0 0 0
N3 ONLINE 0 0 0
spares
draid2-0-0 AVAIL
draid2-1-0 AVAIL
</pre>
<div class="moz-cite-prefix">Hope this helps,</div>
<div class="moz-cite-prefix">Cameron</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 2/6/25 11:29 AM, Nehring, Shane R
[ITS] wrote:<br>
</div>
<blockquote type="cite" cite="mid:0ee568f1b7803b8779dd2f5117e2f2ab3d17561d.camel@iastate.edu">
<pre wrap="" class="moz-quote-pre">Hello All,
I didn't want to hijack the other thread today about draid, but I have been
meaning to ask questions about it and folks' experience with it in the context
of Lustre. Most of my questions come from not having a chance to really play
around with draid much.
Have you been generally satisfied with performance of a single draid vdev vs
either multiple pools/osts per node or single osts on a pool spanning multiple
raidz(2) vdev members? Is random io comparable to a span of raidz2 vdevs? I know
one of the pain points (more from a space usage perspective as I understand it)
is the fixed stripe width and how that impacts small files, but does small file
io perform particularly badly on draid vs a span raidz2?
I've got hardware on order (a couple 60 bay jbods and heads) that's going to
replace some of the older OSTs in our current volume and I'm leaning toward a
single draid pool OST per OSS. I plan to do some benchmarking of the pools in
various configurations, but it's hard to generate a benchmark that's actually
representative of real world usage.
If you've got any insights or anecdotes regarding your experience with draid and
Lustre I'd love to hear them!
Thanks,
Shane
</pre>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
lustre-discuss mailing list
<a class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
<a class="moz-txt-link-freetext" href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a>
</pre>
</blockquote>
</body>
</html>