[Lustre-discuss] NUMA IO and Lustre

Sébastien Buisson sebastien.buisson at bull.net
Wed May 13 03:56:24 PDT 2009


Andreas Dilger a écrit :
> Note that the OST threads are already bound to a particular NUMA node.
> This means that the pages used for the IO are CPU-local and are not
> accessed from a remote CPU's cache.

Indeed, I have seen in lustre/ptlrpc/service.c the following piece of code:

#if defined(HAVE_NODE_TO_CPUMASK) && defined(CONFIG_NUMA)
         /* we need to do this before any per-thread allocation is done 
so that
          * we get the per-thread allocations on local node.  bug 7342 */
         if (svc->srv_cpu_affinity) {
                 int cpu, num_cpu;

                 for (cpu = 0, num_cpu = 0; cpu < num_possible_cpus(); 
cpu++) {
                         if (!cpu_online(cpu))
                                 continue;
                         if (num_cpu == thread->t_id % num_online_cpus())
                                 break;
                         num_cpu++;
                 }
                 set_cpus_allowed(cfs_current(), 
node_to_cpumask(cpu_to_node(cpu)));
         }
#endif

> 
> I don't know if there is a CPU affinity option for the IB interfaces,
> but that is definitely possible.
> 
> 
> Do you know if there is a particular performance problem with the current
> Lustre code, or are you only speculating?
> 

I do not know about Lustre code, we did not have the opportunity to run 
Lustre tests until now. But we carried out basic tests (using xdd and 
ib_rdma_bw for instance) on the machine, which showed that the NUMA IO 
factor do harm to the performance. This is why we are looking for 
solutions to avoid this NUMA IO factor for Lustre.


> Note that there is already work underway to make the IB driver and the
> ptlrpc service handling use per-CPU threads, so if you are interested
> to test this we could give you an early version of the patch when it is
> available.

Yes, we would be very interested in testing an early version of the 
patch that makes possible for the IB driver and the ptlrpc service 
handling to use per-CPU threads. Is there a bugzilla for this?
This feature is necessary for what we are trying to achieve, but I do 
not know if it will be enough. Indeed, what will ensure that all clients 
that want to reach an OST do use the right IB interface, ie the one for 
which there is no NUMA IO factor to the FC adapter that connects the 
LUN? What do you think?


Cheers,
Sebastien.



More information about the lustre-discuss mailing list