[lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)

Tue Dec 20 07:48:59 PST 2016

All,

I am looking for a more complete understanding of how the two settings 
qos_prio_free and qos_threshold_rr function together.

My current understanding, which may be inaccurate, is the following:

*qos_prio_free**
*
This setting controls how much Lustre prioritizes free space (versus 
location for the sake of performance) in allocation.
The higher this number, the more Lustre takes empty space on an OST into 
consideration for its allocation.
When set to 100%, Lustre uses ONLY empty space as the deciding factor 
for writes.

*qos_threshold_rr**
*
This setting controls how much consideration should be given to QoS in 
allocation
The higher this number, the more QOS is taken into consideration.
When set to 100%, Lustre ignores the QoS variable and hits all OSTs equally

I'm looking for several answers:

1) Is my basic understanding of the above settings correct?

2) How does lustre deal with OSTs that are 100% full? I'm curious about 
this under two conditions.

2a) When you set qos_threshold_rr=100 -- meaning, go and hit all the 
OSTs the same amount.

On one of our 2.5.3 lustre filesystems, the allocator is not working (a 
known bug, but why it seems to be behaving fine on the other one, I 
couldn't say...) and so we have configured qos_threshold_rr=100. Since 
our OSTs are pretty dramatically unbalanced, it has happened that 
attempts to write to full OSTs have caused write failures. Data deletes 
have gotten us below 90% on all OSTs now, and while I can certainly take 
the fullest OSTs them out of write mode if that is needed, it would seem 
to me that lustre should, no matter what your qos_threshold_rr setting, 
treat OSTs that are 100% full differently, meaning, it should no longer 
attempt to write to them. In short, this seems like a bug to me... 
although, granted, I suppose if you are overriding the allocator, it's 
caveat user at that point.

2b) When you set qos_threshold_rr != 100 -- meaning, the allocator is 
working

On the other lustre 2.5.3 system, the system defaults 
(qos_prio_free=91%; qos_threshold_rr=17%) are hitting all the OSTs when 
I run my test*, so I have not changed them. Several of the OSTs in this 
file system are at 100%. I get that we are not seeing write failures 
because the allocator is not allocating to these OSTs as frequently, 
based on how full they are. But I know from my test that these OSTs are 
still in the mix... so that implies to me that it would be possible, 
although less likely, to see a write failure if a write stream is opened 
on one of the 100% OSTs. I'd love to be able to quantify that "less likely".

Basically, I guess my question is: is taking an OST out of write mode 
the only (or best) way of preventing the fs from attempting to write to 
it when it is nearly full?

Thanks,
Jessica

------------------------------

*To test file allocation on your lustre system, you can use this 
one-liner from a lustre client. USE IT IN ITS OWN, NEW DIRECTORY!

touch t.{1..2000}; lfs getstripe t.*|fgrep -A1 obdidx|fgrep -v 
obdidx|fgrep -v -- --|awk '{ print $1 }'|sort|uniq -c; rm -f t.*

-- 
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161220/da08c574/attachment.htm>