[Lustre-discuss] Proposed LNET config

Erik Froese erik.froese at gmail.com
Tue Oct 13 18:30:18 PDT 2009


Hello List,

I wanted to outline our proposed configuration and hope for some feedback
from the list on whether the LNET config is sound.

We're planning a site-wide (HPC clusters) Lustre filesystem at NYU to be
installed in the next few weeks. Lustre version 1.8.1.
The MDS, OSS, and routers will be running Redhat 5.3 with Lustre Kernels.
Our cluster compute/login nodes will be running Rocks/Redhat 5.1 with lustre
kernels.

We've installed a small test cluster (1 MDS/MGS, 2 OSS, and 4 compute
clients) with the same versions and it works well.

Note: Sun is going to be onsite to install the MDS failover pair as part of
their "Lustre Building Blocks" (Hi Hung-Sheng)

Here goes.

Core Lustre Network:
We have two Mellanox MTS3600 36-Port QDR switches. I thought we'd configure
one as o2ib0, the other o2ib1.
There's also the possibility of combining them and connecting each dual port
HCA to each switch.

Servers on the "Core Lustre Network" will be known as
$SERVER_TYPE-$SERVER_NAME-$IB_RANK, so we have:

2 MDS/MGS servers configured as a failover pair. Each has a dual port IB QDR
HCA.
MDS servers would be on both core networks, o2ib0 and o2ib1.
mds-0-0 #o2ib0
mds-0-1 #o2ib1
mds-1-0 #o2ib0
mds-1-1 #o2ib1

4 OSS servers will be configured as 2 sets of failover pairs. Each has a
dual port IB QDR HCA.
OSS servers would be on both core networks, o2ib0 and o2ib1.
oss-0-0 #o2ib0
oss-0-1 #o2ib1
...
oss-3-0
oss-3-1

Each cluster (currently 3) will have 2 routers on the "Lustre Core"
network(s) and the "Private Cluster IB" network.
2 of the clusters have DDR private IB networks. The other cluster has a QDR
private IB network. I know the 2 QDR switches are overkill
but they were relatively cheap and should survive adding more clients and
storage.

The DDR cluster's routers will have 2 routers, each with 1 dual port QDR HCA
(core) and 1 single-port DDR HCA (private).
The QDR cluster's routers will have 2 dual port QDR HCAs, 1 core, 1 private.

Here's where I'm not 100000% sure about the proper LNET config.
Let's assume the cluster we're talking about is o2ib4

1. Each cluster sees only one core network o2ib0 OR o2ib1.
This roughly corresponds to the multi-rail config in the manual but
balancing by cluster (not perfect I know).

The routers would be configured with 1 address on the "Lustre Core Network"
and 1 address on the "Private Cluster IB".
Clients would specify mds-0-0:mds-1-0 or mds-0-1:mds-1-1 as the metadata
failover pair when mounting the cluster. Both servers on o2ib0.
options lnet="o2ib4(ib0)" routes="o2ib0 local.router.ip.[1-2]@o2ib4;"
or
options lnet="o2ib4(ib0)" routes="o2ib1 local.router.ip.[1-2]@o2ib4;"

2. Each cluster uses both core networks
The routers would be configured with 1 address on o2ib0, 1 address on o2ib1,
and 1 address on the "Private Cluster IB".
Compute clients would specify mds-0-0,mds-0-1:mds-1-0,mds-0-1
Compute clients would use:
options lnet="o2ib4(ib0)" routes="o2ib0 local.router.ip.[1-2]@o2ib4; o2ib1
local.router.ip.[1-2]@o2ib4;"

Will that work?
If once switch fails, clients should fail over to the other mds address pair
as both addresses become unreachable.
If an MDS fails, the clients should stay on the same o2ib network but fail
over to the other MDS.

Is this possible? I would think that even in the second configuration we'd
want to manually balance the traffic.
Some clients would specify o2ib0 first while other specify o2ib1 first.

Erik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091013/d9684836/attachment.htm>


More information about the lustre-discuss mailing list