[lustre-discuss] kernel threads for rpcs in flight

Andreas Dilger adilger at whamcloud.com
Sun Apr 28 20:37:58 PDT 2024


On Apr 28, 2024, at 16:54, Anna Fuchs via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

The setting max_rpcs_in_flight affects, among other things, how many threads can be spawned simultaneously for processing the RPCs, right?

The {osc,mdc}.*.max_rpcs_in_flight are actually controlling the maximum number of RPCs a *client* will have in flight to any MDT or OST, while the number of MDS and OSS threads is controlled on the server with mds.MDS.mdt*.threads_{min,max} and ost.OSS.ost*.threads_{min,max} for each of the various service portals (which are selected by the client based on the RPC type).  The max_rpcs_in_flight allows concurrent operations on the client for multiple threads to hide network latency and to improve server utilization, without allowing a single client to overwhelm the server.

In tests where the network is clearly a bottleneck, this setting has almost no effect - the network cannot keep up with processing the data, there is not so much to do in parallel.
With a faster network, the stats show higher CPU utilization on different cores (at least on the client).

What is the exact mechanism by which it is decided that a kernel thread is spawned for processing a bulk? Is there an RPC queue with timings or something similar?
Is it in any way predictable or calculable how many threads a specific workload will require (spawn if possible) given the data rates from the network and storage devices?

The mechanism to start new threads is relatively simple.  Before a server thread is processing a new request, if it is the last thread available, and not the maximum number of threads are running, then it will try to launch a new thread; repeat as needed.  So the thread  count will depend on the client RPC load and the RPC processing rate and lock contention on whatever resources those RPCs are accessing.

With max_rpcs_in_flight = 1, multiple cores are loaded, presumably alternately, but the statistics are too inaccurate to capture this.  The distribution of threads to cores is regulated by the Linux kernel, right? Does anyone have experience with what happens when all CPUs are under full load with the application or something else?


Note that {osc,mdc}.*.max_rpcs_in_flight is a *per target* parameter, so a single client can still have tens or hundreds of RPCs in flight to different servers.  The client will send many RPC types directly from the process context, since they are waiting on the result anyway.  For asynchronous bulk RPCs, the ptlrpcd thread will try to process the bulk IO on the same CPT (= Lustre CPU Partition Table, roughly aligned to NUMA nodes) as the userspace application was running when the request was created.  This minimizes the cross-NUMA traffic when accessing pages for bulk RPCs, so long as those cores are not busy with userspace tasks.  Otherwise, the ptlrpcd thread on another CPT will steal RPCs from the queues.

Do the Lustre threads suffer? Is there a prioritization of the Lustre threads over other tasks?

Are you asking about the client or the server?  Many of the client RPCs are generated by the client threads, but for the running ptlrpcd threads do not have a higher priority than client application threads.  If the application threads are running on some cores, but other cores are idle, then the ptlrpcd threads on other cores will try to process the RPCs to allow the application threads to continue running there.  Otherwise, if all cores are busy (as is typical for HPC applications) then they will be scheduled by the kernel as needed.

Are there readily available statistics or tools for this scenario?

What statistics are you looking for?  There are "{osc,mdc}.*.stats" and "{osc,mdc}.*rpc_stats" that have aggregate information about RPC counts and latency.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240429/ebbd45b1/attachment-0001.htm>


More information about the lustre-discuss mailing list