[lustre-discuss] kernel threads for rpcs in flight

Sun Apr 28 15:54:34 PDT 2024

Hello everyone.

The setting |max_rpcs_in_flight| affects, among other things, how many 
threads can be spawned simultaneously for processing the RPCs, right?
In tests where the network is clearly a bottleneck, this setting has 
almost no effect - the network cannot keep up with processing the data, 
there is not so much to do in parallel.
With a faster network, the stats show higher CPU utilization on 
different cores (at least on the client).

What is the exact mechanism by which it is decided that a kernel thread 
is spawned for processing a bulk? Is there an RPC queue with timings or 
something similar?
Is it in any way predictable or calculable how many threads a specific 
workload will require (spawn if possible) given the data rates from the 
network and storage devices?

With |max_||rpcs_in_flight = 1|, multiple cores are loaded, presumably 
alternately, but the statistics are too inaccurate to capture this.
The distribution of threads to cores is regulated by the Linux kernel, 
right? Does anyone have experience with what happens when all CPUs are 
under full load with the application or something else?
Do the Lustre threads suffer? Is there a prioritization of the Lustre 
threads over other tasks?
Are there readily available statistics or tools for this scenario?

Thanks a lot
Anna
--
Anna Fuchs
Universität Hamburg
Department of Computer Science
Research Group Scientific Computing

Bundesstraße 45a
D-20146 Hamburg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240429/a1fd0f40/attachment.htm>