[Lustre-discuss] Optimal stratgy for OST distribution

Jeremy Filizetti jeremy.filizetti at gmail.com
Thu Mar 31 13:59:45 PDT 2011


I this a feature implemented after 1.8.5?  In the past default striping
without an offset resulted in sequential stripe allocation according to
client device order for a striped file.  Basically the order OSTs were
mounted after the the last --writeconf is the order the targets are added to
the client llog and allocated.

It's probably not a big deal for lots of clients but for a small number of
clients doing large sequential IO or working over the WAN it is.  So
regardless of an A or B configuration a file with a stripe count of 3 could
end up issuing IO to a single OSS instead of using round-robin between the
socket/queue pair to each OSS.

Jeremy

On Thu, Mar 31, 2011 at 11:06 AM, Kevin Van Maren <
kevin.van.maren at oracle.com> wrote:

> It used to be that multi-stripe files were created with sequential OST
> indexes.  It also used to be that OST indexes were sequentially assigned
> to newly-created files.
> As Lustre now adds greater randomization, the strategy for assigning
> OSTs to OSS nodes (and storage hardware, which often limits the
> aggregate performance of multiple OSTs) is less important.
>
> While I have normally gone with "a", "b" can make it easier to remember
> where OSTs are located, and also keep a uniform convention if the
> storage system is later grown.
>
> Kevin
>
>
> Heckes, Frank wrote:
> > Hi all,
> >
> > sorry if this question has been answered before.
> >
> > What is the optimal 'strategy' assigning OSTs to OSS nodes:
> >
> > -a- Assign OST via round-robin to the OSS
> > -b- Assign in consecutive order (as long as the backend storage provides
> >     enought capacity for iops and bandwidth)
> > -c- Something 'in-between' the 'extremes' of -a- and -b-
> >
> > E.g.:
> >
> > -a-     OSS_1           OSS_2           OST_3
> >           |_              |_              |_
> >             OST_1           OST_2           OST_3
> >             OST_4           OST_5           OST_6
> >             OST_7           OST_8           OST_9
> >
> > -b-     OSS_1           OSS_2           OST_3
> >           |_              |_              |_
> >             OST_1           OST_4           OST_7
> >             OST_2           OST_5           OST_8
> >             OST_3           OST_6           OST_9
> >
> > I thought -a- would be best for task-local (each task write to own
> > file) and single file (all task write to single file) I/O since its like
> > a raid-0 approach used disk I/O (and SUN create our first FS this way).
> > Does someone made any systematic investigations which approach is best
> > or have some educated opinion?
> > Many thanks in advance.
> > BR
> >
> > -Frank Heckes
> >
> >
> ------------------------------------------------------------------------------------------------
> >
> ------------------------------------------------------------------------------------------------
> > Forschungszentrum Juelich GmbH
> > 52425 Juelich
> > Sitz der Gesellschaft: Juelich
> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> > Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> > Prof. Dr. Sebastian M. Schmidt
> >
> ------------------------------------------------------------------------------------------------
> >
> ------------------------------------------------------------------------------------------------
> >
> > Besuchen Sie uns auf unserem neuen Webauftritt unter www.fz-juelich.de
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110331/3fe7193c/attachment.htm>


More information about the lustre-discuss mailing list