[lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)
Jessica Otey
jotey at nrao.edu
Tue Dec 20 07:48:59 PST 2016
All,
I am looking for a more complete understanding of how the two settings
qos_prio_free and qos_threshold_rr function together.
My current understanding, which may be inaccurate, is the following:
*qos_prio_free**
*
This setting controls how much Lustre prioritizes free space (versus
location for the sake of performance) in allocation.
The higher this number, the more Lustre takes empty space on an OST into
consideration for its allocation.
When set to 100%, Lustre uses ONLY empty space as the deciding factor
for writes.
*qos_threshold_rr**
*
This setting controls how much consideration should be given to QoS in
allocation
The higher this number, the more QOS is taken into consideration.
When set to 100%, Lustre ignores the QoS variable and hits all OSTs equally
I'm looking for several answers:
1) Is my basic understanding of the above settings correct?
2) How does lustre deal with OSTs that are 100% full? I'm curious about
this under two conditions.
2a) When you set qos_threshold_rr=100 -- meaning, go and hit all the
OSTs the same amount.
On one of our 2.5.3 lustre filesystems, the allocator is not working (a
known bug, but why it seems to be behaving fine on the other one, I
couldn't say...) and so we have configured qos_threshold_rr=100. Since
our OSTs are pretty dramatically unbalanced, it has happened that
attempts to write to full OSTs have caused write failures. Data deletes
have gotten us below 90% on all OSTs now, and while I can certainly take
the fullest OSTs them out of write mode if that is needed, it would seem
to me that lustre should, no matter what your qos_threshold_rr setting,
treat OSTs that are 100% full differently, meaning, it should no longer
attempt to write to them. In short, this seems like a bug to me...
although, granted, I suppose if you are overriding the allocator, it's
caveat user at that point.
2b) When you set qos_threshold_rr != 100 -- meaning, the allocator is
working
On the other lustre 2.5.3 system, the system defaults
(qos_prio_free=91%; qos_threshold_rr=17%) are hitting all the OSTs when
I run my test*, so I have not changed them. Several of the OSTs in this
file system are at 100%. I get that we are not seeing write failures
because the allocator is not allocating to these OSTs as frequently,
based on how full they are. But I know from my test that these OSTs are
still in the mix... so that implies to me that it would be possible,
although less likely, to see a write failure if a write stream is opened
on one of the 100% OSTs. I'd love to be able to quantify that "less likely".
Basically, I guess my question is: is taking an OST out of write mode
the only (or best) way of preventing the fs from attempting to write to
it when it is nearly full?
Thanks,
Jessica
------------------------------
*To test file allocation on your lustre system, you can use this
one-liner from a lustre client. USE IT IN ITS OWN, NEW DIRECTORY!
touch t.{1..2000}; lfs getstripe t.*|fgrep -A1 obdidx|fgrep -v
obdidx|fgrep -v -- --|awk '{ print $1 }'|sort|uniq -c; rm -f t.*
--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161220/da08c574/attachment.htm>
More information about the lustre-discuss
mailing list