[Lustre-discuss] stripe offset and hot-spots

Wed Nov 25 03:12:43 PST 2009

We are running lustre 1.6 amounting to 12TB of space. We use stripe offset
and stripe count as '-1' and stripe size of 2MB .
Data on the filesystem comprises of very small to very large files. Some
days back we observed write failure on the fs inspite of having
1.2TB space available (as given by df ). The problem was that 2 of the OSTs
were 100% full.
So can we conclude that more than often the said 2 OSTs were choosen as
start offset for files that were smaller in size(<= 2MB)?

On Wed, Nov 25, 2009 at 2:59 AM, Andreas Dilger <adilger at sun.com> wrote:

> On 2009-11-24, at 12:17, John White wrote:
> >       So I'm trying to get a theoretical understanding of stripe offsets
> > in lustre.  As I understand it, the default offset set to 0 results
> > in all writes beginning at OSS0-OST0.  With a default stripe of 4,
> > doesn't this lead to massive hotspots on OSS0-OST[0-3] (unless *all*
> > writes are consistently large)?
>
> As previously mentioned, the default is NOT to always start files with
> OST0, but rather to have a "round-robin with precession" (not random
> as is commonly mentioned) so that the OST used for stripe 0 of each
> file is evenly distributed among OSTs, regardless of the stripe count.
>
> >       With our setup, we have 4 OSTs per OSS (well, the last OSS has 3,
> > but that's not important right now).  This would appear, in theory,
> > to put OSS0 in a very hot situation.
> >
> >       That said, I wonder how efficient a solution setting the stripe
> > offset of the root of the file system to -1 ("random") is to solving
> > this theoretical situation (given my understanding of striping under
> > lustre).
>
> Well, that is already the default, unless it has been changed at some
> time in the past by someone at your site.  We generally recommend
> against ever changing the starting index of files, since there are
> rarely good reasons to change this.  The man page writes:
>
>         A start-ost of -1 allows the MDS to choose the starting
>         index and it is strongly recommended, as this allows
>         space and load balancing to be done by the MDS as needed.
>
> >       In reality, we have a quite varied workload on our file systems
> > with codes ranging from bio to astrophys and, as such, writes
> > ranging from very small to very large.  Any real-world experience
> > with these situations?  Are there strange inefficiencies or
> > administrative difficulties that should be known previous to
> > enabling "random" offsets?  Any info would be greatly appreciated.
>
>
> It isn't random, specifically to avoid the case of non-uniform
> distribution when many clients are creating files at one time.  With
> random stripe-0 OST selection, it is inevitable that some OSTs get one
> or two more objects, and some OSTs get one or two fewer objects, and
> this can cause dramatic performance impacts.
>
> For example, if the average objects per OST is 2, but some OSTs get 4
> objects and others get no objects then the application may see an
> aggregate performance drop of 50% or more, if it were using random
> object distribution.  With round-robin distribution, every OST will
> get 2 objects (assuming objects / OSTs is a whole number).
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

-- 
Regards--
Rishi Pathak
National PARAM Supercomputing Facility
Center for Development of Advanced Computing(C-DAC)
Pune University Campus,Ganesh Khind Road
Pune-Maharastra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091125/9090d6e4/attachment.htm>