[lustre-discuss] confirming behavior we're seeing

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Tue Dec 8 10:58:52 PST 2015


> On Dec 8, 2015, at 12:41 PM, John White <jwhite at lbl.gov> wrote:
> 
> A month or two ago we expanded a lustre instance (added an OSS+OTSs to a fairly full file system).  Since then, we’ve seen IO patterns that heavily favor the new OSS/OSTs.  In the default allocation strategy, is this to be expected in a file system with heavily disparate free space among OSTs?
> 
> We don’t really have the luxury of rebalancing things (assuming the method for doing such is still “re-write/copy files on old OSTs and let the allocation strategy handle it"), unfortunately, so we’re just looking to confirm the behavior.

That is the expected default behavior, but if you find that there is too much I/O going to the new OSTs, you might be able to tweak some Lustre knobs to adjust things.  If you take a look at section 18.5 in the Lustre manual, there are two parameters that can affect how OSTs are allocated:  qos_threshold_rr and qos_prio_free.  The qos_threshold_rr parameter controls when Lustre switches between the QOS and Round-robin allocators (which helps control the size of the gap between most-used and least-used OST).  When the QOS allocator is being used, Lustre selects OSTs based on a weighted random algorithm.  The qos_prio_free parameter controls how much weight is given to free space versus location (i.e. - on different OSS nodes).

Those parameters can help you control how aggressively Lustre will allocate your new OSTs to files.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu



More information about the lustre-discuss mailing list