[Lustre-discuss] Optimal stratgy for OST distribution

Andreas Dilger adilger at whamcloud.com
Thu Mar 31 22:54:08 PDT 2011


Actually, the MDS will still assign OST indices in round-robin order unless the free space is more than 20% imbalanced. 

However, it will internally do an OSS-first ordering of the OSTs to ensure maximum spreading of load across the OSS nodes. 

For details see lustre/lov/lov_qos.c. 

Cheers, Andreas

On 2011-03-31, at 5:06 AM, Kevin Van Maren <kevin.van.maren at oracle.com> wrote:

> It used to be that multi-stripe files were created with sequential OST 
> indexes.  It also used to be that OST indexes were sequentially assigned 
> to newly-created files.
> As Lustre now adds greater randomization, the strategy for assigning 
> OSTs to OSS nodes (and storage hardware, which often limits the 
> aggregate performance of multiple OSTs) is less important.
> 
> While I have normally gone with "a", "b" can make it easier to remember 
> where OSTs are located, and also keep a uniform convention if the 
> storage system is later grown.
> 
> Kevin
> 
> 
> Heckes, Frank wrote:
>> Hi all,
>> 
>> sorry if this question has been answered before.
>> 
>> What is the optimal 'strategy' assigning OSTs to OSS nodes:
>> 
>> -a- Assign OST via round-robin to the OSS
>> -b- Assign in consecutive order (as long as the backend storage provides
>>    enought capacity for iops and bandwidth)
>> -c- Something 'in-between' the 'extremes' of -a- and -b-
>> 
>> E.g.:
>> 
>> -a-     OSS_1           OSS_2           OST_3
>>          |_              |_              |_
>>            OST_1           OST_2           OST_3
>>            OST_4           OST_5           OST_6
>>            OST_7           OST_8           OST_9
>> 
>> -b-     OSS_1           OSS_2           OST_3
>>          |_              |_              |_
>>            OST_1           OST_4           OST_7
>>            OST_2           OST_5           OST_8
>>            OST_3           OST_6           OST_9
>> 
>> I thought -a- would be best for task-local (each task write to own
>> file) and single file (all task write to single file) I/O since its like
>> a raid-0 approach used disk I/O (and SUN create our first FS this way).
>> Does someone made any systematic investigations which approach is best
>> or have some educated opinion?
>> Many thanks in advance.
>> BR
>> 
>> -Frank Heckes
>> 
>> ------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
>> Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
>> Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> Prof. Dr. Sebastian M. Schmidt
>> ------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------
>> 
>> Besuchen Sie uns auf unserem neuen Webauftritt unter www.fz-juelich.de
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list