[lustre-discuss] Mixed size OST's

Tue Mar 20 12:51:24 PDT 2018

Not unless you set a very complex layout, with a lot of components in it (which has various flaws of its own).  Otherwise you’ll fairly quickly hit your final component for large files, and then you’re stuck.  This limitation is the motivation behind the work proposed in LU-10070.

  *   Patrick

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of "E.S. Rosenberg" <esr+lustre at mail.hebrew.edu>
Date: Tuesday, March 20, 2018 at 2:46 PM
To: "Dilger, Andreas" <andreas.dilger at intel.com>
Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Mixed size OST's

Doesn't PFL also 'solve'/mitigate this issue in the sense that a file doesn't have to remain restricted to the OST(s) it started on?
(And as such balancing will even continue as files grow)
Regards,
Eli

On Fri, Mar 16, 2018 at 9:57 PM, Dilger, Andreas <andreas.dilger at intel.com<mailto:andreas.dilger at intel.com>> wrote:
On Mar 15, 2018, at 09:48, Steve Thompson <smt at vgersoft.com<mailto:smt at vgersoft.com>> wrote:
>
> Lustre newbie here (1 month). Lustre 2.10.3, CentOS 7.4, ZFS 0.7.5. All networking is 10 GbE.
>
> I am building a test Lustre filesystem. So far, I have two OSS's, each with 30 disks of 2 TB each, all in a single zpool per OSS. Everything works well, and was suprisingly easy to build. Thus, two OST's of 60 TB each. File types are comprised of home directories. Clients number about 225 HPC systems (about 2400 cores).
>
> In about a month, I will have a third OSS available, and about a month after that, a fourth. Each of these two systems has 48 disks of 4 TB each. I am looking for advice on how best to configure this. If I go with one OST per system (one zpool comprising 8 x 6 RAIDZ2 vdevs), I will have a lustre f/s comprised of two 60 TB OST's and two 192 TB OST's (minus RAIDZ2 overhead). This is obviously a big mismatch between OST sizes. I have not encountered any discussion of the effect of mixing disparate OST sizes. I could instead format two 96 TB OST's on each system (two zpools of 4 x 6 RAIDZ2 vdevs), or three 64 TB OST's, and so on. More OST's means more striping possibilities, but less vdev's per zpool impacts ZFS performance negatively. More OST's per OSS does not help with network bandwidth to the OSS. How would you go about this?

This is a little bit tricky.  Lustre itself can handle different OST sizes,
as it will run in "QOS allocator" mode (essentially "Quantity of Space", the
full "Quality of Service" was not implemented).  This balances file allocation
across OSTs based on percentage of free space, at the expense of performance
being lower as the only the two new OSTs would be used for 192/252 ~= 75%
of the files, since it isn't possible to *also* use all the OSTs evenly at the
same time (assuming that network speed is your bottleneck, and not disk speed).

For home directory usage this may not be a significant issue. This performance
imbalance would balance out as the larger OSTs became more full, and would not
be seen when files are striped across all OSTs.

I also thought about creating 3x OSTs per new OSS, so they would all be about
the same size and allocated equally.  That means the new OSS nodes would see
about 3x as much IO traffic as the old ones, especially for files striped over
all OSTs.  The drawback here is that the performance imbalance would stay
forever, so in the long run I don't think this is as good as just having a
single larger OST.  This will also become less of a factor as more OSTs are
added to the filesystem and/or you eventually upgrade the initial OSTs to
have larger disks and/or more VDEVs.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180320/46db627a/attachment-0001.html>