[lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Tue Dec 20 08:21:41 PST 2016

> On Dec 20, 2016, at 10:48 AM, Jessica Otey <jotey at nrao.edu> wrote:
> qos_threshold_rr
> This setting controls how much consideration should be given to QoS in allocation
> The higher this number, the more QOS is taken into consideration.
> When set to 100%, Lustre ignores the QoS variable and hits all OSTs equally

Lustre has two algorithms for allocating OSTs:  Round robin and QoS.  Lustre will choose one or the other depending upon how balanced the OST usage is.  The qos_threshold_rr parameter controls the decision on when Lustre thinks the OST usage is balanced.    It has been a while since I have looked at this code, but I think this is how it works:

Assume $max is the maximum amount of free space on any OST in the file system and $min is the minimum amount of free space on any OST.  If ($max - $min) <= (qos_rr_threshold/100)*($max), then the OSTs are considered balanced.  Basically, this means that all the OST usages are within some small window of each other (which be default is 17%).  If qos_threshold_rr=100, then the previous equation is always satisfied and Lustre thinks the OSTs are always balanced.

If the OSTs are “balanced", Lustre will use the round-robin allocator to assign OSTs (regardless of how full they are).  If they are unbalanced, Lustre will use the QoS allocator to assign OSTs.  The QoS allocator uses a weighted random mechanism to select OSTs.  OSTs that are the least full have a greater chance of being allocated (in an attempt to bring the system back into balance), but there is still some chance that full OSTs could be selected.  The qos_prio_free helps control the weighting factor in this decision (I think).

If you have an OST that is nearly full, you can “disable” it in the sense that the MDS will not choose it when assigning OSTs, but it will still be available for clients to read from.  In older versions (like 2.4/2.5), I believe the recommended way was to run “lctl disable” on the MDS node to disable a given OST (I can’t remember the exact options off the top of my head).   If the OST usage drops, then you can use “lctl enable” to reenable it.

Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences

More information about the lustre-discuss mailing list