[Lustre-discuss] Performance versus allocation balance(?)

Mon Jun 27 12:57:51 PDT 2011

On 2011-06-27, at 3:25 AM, Frank Heckes wrote:
> (many thanks to J. Lombardi directing our intention to this feature)
> we noticed that a filesytem based on
> 
>  2 x  (4 x OSS + 1 SFA 10000)
>  SLES11 + Lustre 1.8.4
> 
> slowed down drastically from 19.2 GB/s write throughput to 2.9 GB/s.
> Of course the environment wasn't changed. ;-) Measurements were taken on
> a nearly empty file system (19.2GB/s) and the current state 70% full
> (2.9GB/s). All measurements were performed exclusively during system
> maintenance. I.e. no other application using the cluster nor the storage
> devices were active.
> 
> When trying to find a solution to reach the 'old' value again a test
> changing the parameter 'qos_threshold_rr' to 100% leads to the desired
> performance.
> 
> Upon checking the performance counters on the OS and the SFA side we
> noticed that for the default setting for qos_threshold (16%) the
> 'distribution' of the objects was not equal. A large number of OST were
> completely inactive. We found as expected an equal load distribution for
> 100%.

This dynamic load balancing is intended to even out OST usage in case
the free space is imbalanced between OSTs.  I haven't heard it being
such a significant impact on performance in the past, however.

Do you have imbalanced free space on the OSTs?  If yes, how much and do
you know why it got imbalanced in the past?

> Does anyone can confirm this observation?
> 
> Is this really a 'feature' of Lustre and if so are there plans to
> 're-design' the object allocation part, so that
> 
> -a- full bandwidth can be reached with help of RR

There is a bug open that discusses how to improve the OST space balancing
using the round-robin allocator to avoid the dramatic performance loss
that you are seeing.  In https://bugzilla.lustre.org/show_bug.cgi?id=18547
there is a discussion of the issues and possible solutions, but work has
not progressed recently.

> -b- Allocation balancing is done with help of a 'background' thread
>    shifting objects accordingly to the striping policy of the file(s)
>    some time later to OSTs with lower object allocation?

This is a separate issue, and depends on at least part of the HSM feature
landing in order to provide the "layout lock" functionality to allow files
to change the OST objects over which they are striped in a safe way.  The
HSM policy engine is also well suited to doing background space management.

It is always better to do proper layout in the first place ("a" above) to
avoid the need to move data around afterward, so I think both approaches
are useful to implement.

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.