[lustre-discuss] kernel threads for rpcs in flight

Mon Apr 29 01:36:33 PDT 2024

Hi Andreas.

Thank you very much, that helps a lot.
Sorry for the confusion, I primarily meant the client. The servers 
rarely have to compete with anything else for CPU resources I guess.

> The mechanism to start new threads is relatively simple.  Before a 
> server thread is processing a new request, if it is the last thread 
> available, and not the maximum number of threads are running, then it 
> will try to launch a new thread; repeat as needed.  So the thread 
>  count will depend on the client RPC load and the RPC processing rate 
> and lock contention on whatever resources those RPCs are accessing.
And what conditions are on the client? Are the threads then driven by 
the workload of the application somehow?

Imagine an edge case where all but one core are pinned and at 100% 
constant load and one is dumping RAM to Lustre. Presumably, the 
available core will be taken. But will Lustre or the kernel then spawn 
additional threads and try to somehow interleave them with those of the 
application, or will it simply handle it with 1-2 threads on the 
available core (assume single stream to single OST)? In any case, I 
suppose the I/O transfer would suffer under the resource shortage, but 
my question would be to what extent it would (try to) hinder the 
application. For latency-critical applications, such small delays can 
already lead to idle waves. And surely, the Lustre threads are usually 
not CPU-hungr, but they will when it comes to encryption and compression.

>> With |max_||rpcs_in_flight = 1|, multiple cores are loaded, 
>> presumably alternately, but the statistics are too inaccurate to 
>> capture this.  The distribution of threads to cores is regulated by 
>> the Linux kernel, right? Does anyone have experience with what 
>> happens when all CPUs are under full load with the application or 
>> something else?
>
> Note that {osc,mdc}.*.max_rpcs_in_flight is a *per target* parameter, 
> so a single client can still have tens or hundreds of RPCs in flight 
> to different servers.  The client will send many RPC types directly 
> from the process context, since they are waiting on the result anyway. 
>  For asynchronous bulk RPCs, the ptlrpcd thread will try to process 
> the bulk IO on the same CPT (= Lustre CPU Partition Table, roughly 
> aligned to NUMA nodes) as the userspace application was running when 
> the request was created.  This minimizes the cross-NUMA traffic when 
> accessing pages for bulk RPCs, so long as those cores are not busy 
> with userspace tasks.  Otherwise, the ptlrpcd thread on another CPT 
> will steal RPCs from the queues.
>
>> Do the Lustre threads suffer? Is there a prioritization of the Lustre 
>> threads over other tasks?
>
> Are you asking about the client or the server?  Many of the client 
> RPCs are generated by the client threads, but for the running ptlrpcd 
> threads do not have a higher priority than client application threads. 
>  If the application threads are running on some cores, but other cores 
> are idle, then the ptlrpcd threads on other cores will try to process 
> the RPCs to allow the application threads to continue running there. 
>  Otherwise, if all cores are busy (as is typical for HPC applications) 
> then they will be scheduled by the kernel as needed.
>
>> Are there readily available statistics or tools for this scenario?
>
> What statistics are you looking for?  There are "{osc,mdc}.*.stats" 
> and "{osc,mdc}.*rpc_stats" that have aggregate information about RPC 
> counts and latency.
Oh, right, these tell a lot. Isn't there also something to log the 
utilization and location of these threads? Otherwise, I'll continue 
trying with perf, which seems to be more complex with kernel threads.

Thanks for the explanations!

Anna
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240429/e75751f9/attachment.htm>