[lustre-discuss] OSTs per OSS with ZFS

Nathan R.M. Crawford nrcrawfo at uci.edu
Thu Jul 6 14:43:41 PDT 2017


On a somewhat-related question, what are the expected trade-offs when
splitting the "striping" between ZFS (striping over vdevs) and Lustre
(striping over OSTs)?

Specific example: if one has an OSS with 40 disks and intends to use
10-disk raidz2 vdevs, how do these options compare?:

A) 4 OSTs, each on a zpool with a single raidz2 vdev,
B) 2 OSTs, each on a zpool with two vdevs, and
C) 1 OST, on a zpool with 4 vdevs?

  I've done some simple testing with obdfilter-survey and multiple-client
file operations on some actual user data, and am leaning more toward "A".
However, the differences weren't overwhelming, and I am probably neglecting
some important corner cases. Handling striping pattern at the Lustre level
(A) also allows tuning on a per-file basis.

-Nate

On Mon, Jul 3, 2017 at 1:15 AM, Dilger, Andreas <andreas.dilger at intel.com>
wrote:

> We have seen performance improvements with multiple zpools/OSTs per OSS.
> However, with only 5x NVMe devices per OSS you don't have many choices in
> terms of redundancy, unless you are not using any redundancy at all, just
> raw bandwidth?
>
> The other thing to consider is what the network bandwidth is vs. the NVMe
> bandwidth?  With similar test systems using NVMe devices without redundancy
> we've seen multi GB/s, so if you aren't using OPA/IB network then that will
> likely be your bottleneck. Even if the TCP is fast enough, the CPU overhead
> and data copies will probably kill the performance.
>
> In the end, you can probably test with a few of configs to see which one
> will give the best performance - mirror, single RAID-Z, two RAID-Z pools on
> half-sized partitions, five no-redundancy zpools with one VDEV each, single
> no-redundancy zpool with five VDEVs.
>
> Cheers, Andreas
>
> PS - there is initial snapshot functionality in the 2.10 release.
>
> > On Jul 2, 2017, at 10:07, Brian Andrus <toomuchit at gmail.com> wrote:
> >
> > All,
> >
> > We have been having some discussion about the best practices when
> creating OSTs with ZFS.
> >
> > The basic question is: What is the best ration of OSTs per OSS when
> using ZFS?
> > It is easy enough to do a single OST with all disks and have reliable
> data protection provided by ZFS. It may be an better scenario when
> snapshots of lfs become a feature as well.
> >
> > However, multiple OSTs can mean more stripes and faster reads/writes. I
> have seen some tests that were done quite some time ago which may not be so
> valid anymore with the updates to Lustre.
> >
> > We have a system for testing that has 5 NVMes each. We can do 1 zfs file
> system with all or we can separate them into 5 (which would forgo some of
> the features of zfs).
> >
> > Any prior experience/knowledge/suggestions would be appreciated.
> >
> > Brian Andrus
> >
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>



-- 

Dr. Nathan Crawford              nathan.crawford at uci.edu
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170706/1361ab9a/attachment.htm>


More information about the lustre-discuss mailing list