[Lustre-discuss] NUMA IO and Lustre

Andreas Dilger adilger at sun.com
Tue May 12 12:58:52 PDT 2009


On May 12, 2009  16:07 +0200, S�bastien Buisson wrote:
> At Bull we would like to use a specific machine as Lustre OSS server. 
> This is a NUMA IO machine made of 2 Infiniband interfaces for the 
> connections to the clients, and 2 FiberChannel interfaces giving access 
> to 8 LUNs.
> 
> Given this architecture, the stake is to avoid as much as possible 
> suffering from the NUMA factor. Ideally, this would require the ability 
> from Lustre to 'bind' a given OST to a given IB interface (let's 
> consider we know which IB interface best suits a given FC interface). 
> The goal is to ensure that no 'NUMA IO tax' is paid when data is 
> transferred between an FC interface and an IB interface, ie when a 
> Lustre client reads or writes from/to an OST.

Note that the OST threads are already bound to a particular NUMA node.
This means that the pages used for the IO are CPU-local and are not
accessed from a remote CPU's cache.

I don't know if there is a CPU affinity option for the IB interfaces,
but that is definitely possible.

> Concretely, we would like to know if it is possible in Lustre to bind an 
> OST to a specific network interface, so that this OST is only reached 
> through this interface (thus avoiding the NUMA IO factor in our case) ? 
> For instance, we would like to have 4 OSTs attached to ib0 and the 4 
> other OSTs attached to ib1.

Do you know if there is a particular performance problem with the current
Lustre code, or are you only speculating?

Note that there is already work underway to make the IB driver and the
ptlrpc service handling use per-CPU threads, so if you are interested
to test this we could give you an early version of the patch when it is
available.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list