No, the algorithm is not purely random, it is weighted on QOS, space and a few other things.<div>When a stripe is chosen on one OSS, we add a penalty to the other OSTs on that OSS to prevent </div><div>IO bunching on one OSS. </div>

<div>cliffw</div><div> <br><br><div class="gmail_quote">On Thu, Mar 31, 2011 at 1:59 PM, Jeremy Filizetti <span dir="ltr"><<a href="mailto:jeremy.filizetti@gmail.com">jeremy.filizetti@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">I this a feature implemented after 1.8.5?  In the past default striping without an offset resulted in sequential stripe allocation according to client device order for a striped file.  Basically the order OSTs were mounted after the the last --writeconf is the order the targets are added to the client llog and allocated.  <br>


<br>It's probably not a big deal for lots of clients but for a small number of clients doing large sequential IO or working over the WAN it is.  So regardless of an A or B configuration a file with a stripe count of 3 could end up issuing IO to a single OSS instead of using round-robin between the socket/queue pair to each OSS.<br>

<font color="#888888">

<br>Jeremy</font><div><div></div><div class="h5"><br><br><div class="gmail_quote">On Thu, Mar 31, 2011 at 11:06 AM, Kevin Van Maren <span dir="ltr"><<a href="mailto:kevin.van.maren@oracle.com" target="_blank">kevin.van.maren@oracle.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

It used to be that multi-stripe files were created with sequential OST<br>

indexes.  It also used to be that OST indexes were sequentially assigned<br>

to newly-created files.<br>

As Lustre now adds greater randomization, the strategy for assigning<br>

OSTs to OSS nodes (and storage hardware, which often limits the<br>

aggregate performance of multiple OSTs) is less important.<br>

<br>

While I have normally gone with "a", "b" can make it easier to remember<br>

where OSTs are located, and also keep a uniform convention if the<br>

storage system is later grown.<br>

<font color="#888888"><br>

Kevin<br>

</font><div><div></div><div><br>

<br>

Heckes, Frank wrote:<br>

> Hi all,<br>

><br>

> sorry if this question has been answered before.<br>

><br>

> What is the optimal 'strategy' assigning OSTs to OSS nodes:<br>

><br>

> -a- Assign OST via round-robin to the OSS<br>

> -b- Assign in consecutive order (as long as the backend storage provides<br>

>     enought capacity for iops and bandwidth)<br>

> -c- Something 'in-between' the 'extremes' of -a- and -b-<br>

><br>

> E.g.:<br>

><br>

> -a-     OSS_1           OSS_2           OST_3<br>

>           |_              |_              |_<br>

>             OST_1           OST_2           OST_3<br>

>             OST_4           OST_5           OST_6<br>

>             OST_7           OST_8           OST_9<br>

><br>

> -b-     OSS_1           OSS_2           OST_3<br>

>           |_              |_              |_<br>

>             OST_1           OST_4           OST_7<br>

>             OST_2           OST_5           OST_8<br>

>             OST_3           OST_6           OST_9<br>

><br>

> I thought -a- would be best for task-local (each task write to own<br>

> file) and single file (all task write to single file) I/O since its like<br>

> a raid-0 approach used disk I/O (and SUN create our first FS this way).<br>

> Does someone made any systematic investigations which approach is best<br>

> or have some educated opinion?<br>

> Many thanks in advance.<br>

> BR<br>

><br>

> -Frank Heckes<br>

><br>

> ------------------------------------------------------------------------------------------------<br>

> ------------------------------------------------------------------------------------------------<br>

> Forschungszentrum Juelich GmbH<br>

> 52425 Juelich<br>

> Sitz der Gesellschaft: Juelich<br>

> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498<br>

> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher<br>

> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),<br>

> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,<br>

> Prof. Dr. Sebastian M. Schmidt<br>

> ------------------------------------------------------------------------------------------------<br>

> ------------------------------------------------------------------------------------------------<br>

><br>

> Besuchen Sie uns auf unserem neuen Webauftritt unter <a href="http://www.fz-juelich.de" target="_blank">www.fz-juelich.de</a><br>

> _______________________________________________<br>

> Lustre-discuss mailing list<br>

> <a href="mailto:Lustre-discuss@lists.lustre.org" target="_blank">Lustre-discuss@lists.lustre.org</a><br>

> <a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

><br>

<br>

_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org" target="_blank">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

</div></div></blockquote></div><br>

</div></div><br>_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br>cliffw<div>Support Guy</div><div>WhamCloud, Inc. </div><div><a href="http://www.whamcloud.com" target="_blank">www.whamcloud.com</a></div><div><br></div><br>

</div>