[lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

Sun May 29 10:16:24 PDT 2016

After (finally) reading this interesting discussion I was left with one
question:
Some of the rules suggested above would imply quite a large amount of
stripes as files get truly big, isn't the (logical) upper limit on striping
the amount of OSTs you have in the system?
Striping more then the OST count intuitively would seem to be
counter-productive (granted that we on a fairly small system already have
15 OSTs so depending on which rule was used the files could be up to 1.5T
under the 100G rule)...

Thanks,
Eli

On Thu, May 26, 2016 at 8:13 PM, Nathan Dauchy - NOAA Affiliate <
nathan.dauchy at noaa.gov> wrote:

> Andreas,
>
> Thanks very much for your comments...
>
> On Wed, May 18, 2016 at 1:30 PM, Dilger, Andreas <andreas.dilger at intel.com
> > wrote:
>
>>
>> On 2016/05/18, 11:22, "Nathan Dauchy - NOAA Affiliate" <
>> nathan.dauchy at noaa.gov> wrote:
>>
>> I'm looking for your experience and perhaps some lively discussion
>> regarding "best practices" for choosing a file stripe count.  The Lustre
>> manual has good tips on "Choosing a Stripe Size", and in practice the
>> default 1M rarely causes problems on our systems. Stripe Count on the other
>> hand is far more difficult to chose a single value that is efficient for a
>> general purpose and multi-use site-wide file system.
>> What do you all recommend as a reasonable rule of thumb that works for
>> "most" user's needs, where stripe count can be determined based only on
>> static data attributes (such as file size)?
>>
>> Using the log2() value seems reasonable.
>>
>> Ideally, I would like to have a tool to give the users and say "go
>> restripe your directory with this command" and it will do the right thing
>> in 90% of cases.  See the rough patch to lfs_migrate (included below) which
>> should help explain what I'm thinking.  Probably there are more efficient
>> ways of doing things, but I have tested it lightly and it works as a
>> proof-of-concept.
>>
>>
>> I'd welcome this as a patch submitted to Gerrit.
>>
>>
> A Jira ticket has been created:
> https://jira.hpdd.intel.com/browse/LU-8207
>
> The draft patch is there, and probably needs a bit of work before pushing
> into Gerrit.  If anyone wants to tackle that, assistance appreciated of
> course! :)
>
> With a good programmatic rule of thumb, we (as a Lustre community!) can
>> eventually work with application developers to embed the stripe count
>> selection into their code and get things at least closer to right up
>> front.  Even if trial and error is involved to find the optimal setting, at
>> least the rule of thumb can be a _starting_point_ for the users, and they
>> can tweak it from there based on application, model, scale, dataset, etc.
>>
>> Thinking farther down the road, with progressive file layout, what
>> algorithm will be used as the default?
>>
>>
>> To be clear, the PFL implementation does not currently have an
>> algorithmic layout, rather a series of thresholds based on file size that
>> will select different layouts (initially stripe counts, but could be
>> anything including stripe size, OST pools, etc).  The PFL size thresholds
>> and stripe counts _could_ be set up (manually) as as a geometric series,
>> but they can also be totally arbitrary if you want.
>>
>
> Understood.  However, Lustre will still need to have some sort of default
> layout.  I was thinking that it would be good to match that future code
> with current best-practice recommendations and whatever ends up in
> lfs_migrate for auto-striping.
>
>
>>
>> If Lustre gets to the point where it can rebalance OST capacity behind
>> the scenes, could it also make some intelligent choice about restriping
>> very large files to spread out load and better balance capacity?  (Would
>> that mean we need a bit set on the file to flag whether the stripe info was
>> set specifically by the user or automatically by Lustre tools or it was
>> just using the system default?)  Can the filesystem track concurrent access
>> to a file, and perhaps migrate the file and adjust stripe count based on
>> number of active clients?
>>
>>
>> I think this would be an interesting task for RobinHood, since it already
>> has much of this information.  It could find large files with low stripe
>> counts and restripe them during OST rebalancing.
>>
>
> Yes, the need to rebalance OSTs when adding new ones to the file system is
> in part what prompted this topic.  We have only experimented with Robinhood
> as a low-priority task, but hope to use it more in the future.
>
> I was picturing that the general rebalance process (without robinhood)
> would be something like:
>
> * Identify the most full OSTs with something like:
> # lfs df $FS | grep OST | sort -k 4 -n | head -n 4
>
> * Search for singly-striped, large, and inactive files on those OSTs with:
> # lfs find * -type f -mtime +30 -size +8G -c 1 -O A,B,N,X > filelist
>
> * Restripe those files with:
> # lfs_migrate -A -y < filelist
>
>
>> One last comment on the patch below:
>> Instead of involving "bc", which is not guaranteed to be installed, why
>> not just have a simple "divide by 2, increment stripe_count" loop after
>> converting bytes to GiB?  That would be a few cycles for huge files, but
>> probably still faster than fork/exec of an external binary as it could be
>> at most 63 - 30 = 33 loops and usually many fewer.
>>
>
> Good point.  I made a note to that effect in the Jira ticket.  In general,
> I would think that external commands in the "coreutils" package are OK
> (cut, wc, head, tr, comm) but others (bc, sed, awk, grep) should be avoided.
>
> Cheers,
> Nathan
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160529/46bc1dee/attachment.htm>