[Lustre-devel] SMP Scalability, MDS, reducing cpu pingpong
Nicolas.Williams at sun.com
Wed Jul 29 12:22:30 PDT 2009
On Wed, Jul 29, 2009 at 04:37:29PM +0100, Eric Barton wrote:
> > Also on lustre front - something I plan to tackle, though not yet
> > completely sure how: Lustre has a concept of reserving one thread for
> > difficult replies handling + one thread for high priority messages
> > handling (if enabled). In SMP scalability branch that becomes 2x
> > num_cpus reserved threads potentially per service since naturally
> > rep_ack reply or high prio message might arrive on any cpu separately
> > now (and message queues are per cpu) - seems like huge overkill to
> > me. I see that there is a handle reply separate threads in HEAD now,
> > so perhaps this could be greatly simplified by proper usage of those.
> > the high prio seems to be harder to improve, though.
> These threads are required in case all normal service threads are
> blocking. I don't suppose this can be a performance critical case, so
> voilating CPU affinity for the sake of deadlock avoidance seems OK.
> However is 1 extra thread per CPU such a big deal? We'll have
> 10s-100s of them in any case.
Probably not. You could have a single thread per-CPU if everything was
written in async I/O, continuation passing style (CPS), blocking only in
an event loop per-CPU. That'd reduce context switches, but it'd
increase the amount of context being saved and read as that one thread
services each event/event completion. In other words, you'd still have
Also, the code would get insanely complicated -- CPS is for compilers,
not humans (nor do we have Scheme-like continuations in C nor in the
Linux kernel, and if we did that'd add quite a bit of run-time overhead
too). And kernels are not usually written this way either, so it may
not even be feasible. The thread model is just easier to code to.
> > Do anybody else have any extra thoughts for lustre side
> > improvements we can get off this?
> I think we need measurements to prove/disprove whether object affinity
> trumps client affinity.
If we have secure PTLRPC in the picture then client affinity is more
likely to trump object affinity: between keys, key schedules, and
sequence number windows may add up to enough. (Of course, we could have
multiple streams per-client, so that a client could be serviced by
multiple server CPUs.)
More information about the lustre-devel