[Lustre-discuss] Optimal stratgy for OST distribution

Cliff White cliffw at whamcloud.com
Thu Mar 31 14:27:24 PDT 2011


No, the algorithm is not purely random, it is weighted on QOS, space and a
few other things.
When a stripe is chosen on one OSS, we add a penalty to the other OSTs on
that OSS to prevent
IO bunching on one OSS.
cliffw


On Thu, Mar 31, 2011 at 1:59 PM, Jeremy Filizetti <
jeremy.filizetti at gmail.com> wrote:

> I this a feature implemented after 1.8.5?  In the past default striping
> without an offset resulted in sequential stripe allocation according to
> client device order for a striped file.  Basically the order OSTs were
> mounted after the the last --writeconf is the order the targets are added to
> the client llog and allocated.
>
> It's probably not a big deal for lots of clients but for a small number of
> clients doing large sequential IO or working over the WAN it is.  So
> regardless of an A or B configuration a file with a stripe count of 3 could
> end up issuing IO to a single OSS instead of using round-robin between the
> socket/queue pair to each OSS.
>
> Jeremy
>
>
> On Thu, Mar 31, 2011 at 11:06 AM, Kevin Van Maren <
> kevin.van.maren at oracle.com> wrote:
>
>> It used to be that multi-stripe files were created with sequential OST
>> indexes.  It also used to be that OST indexes were sequentially assigned
>> to newly-created files.
>> As Lustre now adds greater randomization, the strategy for assigning
>> OSTs to OSS nodes (and storage hardware, which often limits the
>> aggregate performance of multiple OSTs) is less important.
>>
>> While I have normally gone with "a", "b" can make it easier to remember
>> where OSTs are located, and also keep a uniform convention if the
>> storage system is later grown.
>>
>> Kevin
>>
>>
>> Heckes, Frank wrote:
>> > Hi all,
>> >
>> > sorry if this question has been answered before.
>> >
>> > What is the optimal 'strategy' assigning OSTs to OSS nodes:
>> >
>> > -a- Assign OST via round-robin to the OSS
>> > -b- Assign in consecutive order (as long as the backend storage provides
>> >     enought capacity for iops and bandwidth)
>> > -c- Something 'in-between' the 'extremes' of -a- and -b-
>> >
>> > E.g.:
>> >
>> > -a-     OSS_1           OSS_2           OST_3
>> >           |_              |_              |_
>> >             OST_1           OST_2           OST_3
>> >             OST_4           OST_5           OST_6
>> >             OST_7           OST_8           OST_9
>> >
>> > -b-     OSS_1           OSS_2           OST_3
>> >           |_              |_              |_
>> >             OST_1           OST_4           OST_7
>> >             OST_2           OST_5           OST_8
>> >             OST_3           OST_6           OST_9
>> >
>> > I thought -a- would be best for task-local (each task write to own
>> > file) and single file (all task write to single file) I/O since its like
>> > a raid-0 approach used disk I/O (and SUN create our first FS this way).
>> > Does someone made any systematic investigations which approach is best
>> > or have some educated opinion?
>> > Many thanks in advance.
>> > BR
>> >
>> > -Frank Heckes
>> >
>> >
>> ------------------------------------------------------------------------------------------------
>> >
>> ------------------------------------------------------------------------------------------------
>> > Forschungszentrum Juelich GmbH
>> > 52425 Juelich
>> > Sitz der Gesellschaft: Juelich
>> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
>> > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
>> > Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> > Prof. Dr. Sebastian M. Schmidt
>> >
>> ------------------------------------------------------------------------------------------------
>> >
>> ------------------------------------------------------------------------------------------------
>> >
>> > Besuchen Sie uns auf unserem neuen Webauftritt unter www.fz-juelich.de
>> > _______________________________________________
>> > Lustre-discuss mailing list
>> > Lustre-discuss at lists.lustre.org
>> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> >
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>


-- 
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110331/e32b79e7/attachment.htm>


More information about the lustre-discuss mailing list