[lustre-discuss] NRS TBF by UID and congestion

Fri Oct 15 00:13:16 PDT 2021

Salut Diego!

Yes, we have been using NRS TBF by UID on our Oak storage system for months now with Lustre 2.12. It’s a capacity-oriented, global filesystem, not designed for heavy workloads (unlike our scratch filesystem) but with many users and as such, a great candidate for NRS TBF UID. Since NRS, we have seen WAY fewer occurrences of single users abusing the system (which is always by mistake so we’re helping them too!). We use NRS TBF UID for all Lustre services on MDS and OSS.

We have an “exemption" rule for "root {0}" at 10000, and a default rule "default {*}” at a certain value. This value is per user and per CPT (it’s also a value per lustre service on the MDS for example, eg. mdt_readpage is a separate service). If you have large servers with many CPTs and set the value to 500, that’s 500 req/s per CPT per user, so perhaps it is still too high to be useful. The ideal value also probably depends on your default striping or other specifics.

To set the NRS rate values right for the system, our approach is to monitor the active/queued values taken from the ’tbf uid’ policy on each OSS with lctl get_param ost.OSS.ost_io.nrs_tbf_rule (same thing on MDS for each mdt service). We record these instant gauge-like values every minute, which seems to be enough to see trends. The ‘queued' number is the most useful to me as I can easily see the impact of the rule by looking at the graph. Graphing these metrics over time allows us to adjust the rates so that queueing is not the norm, but the exception, while limiting heavy workloads.

So it’s working for us on this system, the only thing now is that we would love to have a way to get additional NRS stats from Lustre, for example, the UIDs that have reached the rate limit over a period.

Lastly, we tried to implement it on our scratch filesystem, but it’s more difficult. If a user has heavy duty jobs running on compute nodes and hit the rate limit, the user basically cannot transfer anything from a DTN or a login node (and will complain). I’ve opened LU-14567 to discuss wildcard support for “uid" in NRS TBF policy (’tbf’ and not ’tbf uid’) rules so that we could mix other, non-UID TBF rules with UID TBF rules. I don’t know how hard it is to implement.

Hope that helps,

Stephane

> On Oct 14, 2021, at 12:33 PM, Moreno Diego (ID SIS) <diego.moreno at id.ethz.ch> wrote:
> 
> Hi Lustre friends,
> 
> I'm wondering if someone has experience setting NRS TBF (by UID) on the OSTs (ost_io and ost service) in order to avoid congestion of the filesystem IOPS or bandwidth. All my tries during the last months have miserably failed into something that doesn’t look like QoS when the system has a high load. Once the system is under high load not even the TBF UID policy is saving us from slow response times for any user. So far, I have only tried setting it by UID so every user has their fair share of bandwidth. I tried different rate values for the default rule (5'000, 1'000 or 500). We have Lustre 2.12 in our cluster.
> 
> Maybe there's any other setting that needs throttling (I see a parameter /sys/module/ptlrpc/parameters/tbf_rate that I could not find documented set to 10'000), is there anything I'm missing about this feature?
> 
> Regards,
> 
> Diego
> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org