[Lustre-discuss] Multiple IB ports
Rick Mohr
rmohr at utk.edu
Fri Apr 1 14:34:09 PDT 2011
On Sun, 2011-03-20 at 22:53 -0500, Brian O'Connor wrote:
> Any body actually using multiple IB ports on a client for an
> aggregated connection?
I am trying to do something like what you mentioned. I am working on a
machine with multiple IB ports, but rather than trying to aggregate
links, I am just trying to direct Lustre traffic over different IB ports
so there will essentially be a single QDR IB link dedicated to each
MDS/OSS server. Below are some of the main details. (I can provide
more detailed info if you think it would be useful.)
The storage is a DDN SFA10k couplet with 28 LUNs. Each controller in
the couplet has 4 QDR IB ports, but only 2 on each controller are
connected to the IB fabric. The is a single MGS/MDS server and 4 OSS
servers. All servers have a single QDR IB port connected to the fabric.
Each OSS node does SRP login to a different DDN port and serves out 7 of
the 28 OSTs. The lustre client is a SGI UV1000 (1024 cores, 4TB RAM)
with 24 QDR IB ports (of which we are currently only using 5 ports).
The 5 MDS/OSS servers have their single IB ports configured on 2
different lnets. All 5 servers have o2ib0 configured as well as a
specific lnet for that server (oss1 => o2ib1, oss2->o2ib2, ...,
mds->o2ib5). The client has lnets o2ib[1-5] configured (one on each of
the 5 IB ports). I also had to configure some static ip routes on the
client so that each lustre server could ping the corresponding port on
the client.
I am still doing performance testing and playing around with
configuration parameters. In general, I am getting performance that is
better than using a single QDR IB link, but it certainly is not scaling
up linearly. I can't say for sure where the bottleneck is. It could be
a misconfiguration on my part, some limitation I am hitting within
lustre, or just the natural result of running lustre on a giant single
system image SMP machine. (Although I am pretty sure that at least part
of the problem is due to poor NUMA remote memory access performance.)
--
Rick Mohr
HPC Systems Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu/
More information about the lustre-discuss
mailing list