[Lustre-discuss] stripe offset and hot-spots

Tue Nov 24 13:29:55 PST 2009

On 2009-11-24, at 12:17, John White wrote:
> 	So I'm trying to get a theoretical understanding of stripe offsets  
> in lustre.  As I understand it, the default offset set to 0 results  
> in all writes beginning at OSS0-OST0.  With a default stripe of 4,  
> doesn't this lead to massive hotspots on OSS0-OST[0-3] (unless *all*  
> writes are consistently large)?

As previously mentioned, the default is NOT to always start files with  
OST0, but rather to have a "round-robin with precession" (not random  
as is commonly mentioned) so that the OST used for stripe 0 of each  
file is evenly distributed among OSTs, regardless of the stripe count.

> 	With our setup, we have 4 OSTs per OSS (well, the last OSS has 3,  
> but that's not important right now).  This would appear, in theory,  
> to put OSS0 in a very hot situation.
>
> 	That said, I wonder how efficient a solution setting the stripe  
> offset of the root of the file system to -1 ("random") is to solving  
> this theoretical situation (given my understanding of striping under  
> lustre).

Well, that is already the default, unless it has been changed at some  
time in the past by someone at your site.  We generally recommend  
against ever changing the starting index of files, since there are  
rarely good reasons to change this.  The man page writes:

         A start-ost of -1 allows the MDS to choose the starting
         index and it is strongly recommended, as this allows
         space and load balancing to be done by the MDS as needed.

> 	In reality, we have a quite varied workload on our file systems  
> with codes ranging from bio to astrophys and, as such, writes  
> ranging from very small to very large.  Any real-world experience  
> with these situations?  Are there strange inefficiencies or  
> administrative difficulties that should be known previous to  
> enabling "random" offsets?  Any info would be greatly appreciated.

It isn't random, specifically to avoid the case of non-uniform  
distribution when many clients are creating files at one time.  With  
random stripe-0 OST selection, it is inevitable that some OSTs get one  
or two more objects, and some OSTs get one or two fewer objects, and  
this can cause dramatic performance impacts.

For example, if the average objects per OST is 2, but some OSTs get 4  
objects and others get no objects then the application may see an  
aggregate performance drop of 50% or more, if it were using random  
object distribution.  With round-robin distribution, every OST will  
get 2 objects (assuming objects / OSTs is a whole number).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.