[Lustre-discuss] LNET over 2 x 10 GbE switches
Andrus, Brian Contractor
bdandrus at nps.edu
Mon Jul 8 23:07:02 PDT 2013
You are running into a similar problem that comes from even growing a SAN by adding fiber switches.
Option 2 is a good way to go if you can afford the ports. I would still consider it a stop-gap until you are able to purchase a more powerful core switch with enough ports.
Otherwise, once you outgrow this setup, you will end up with several little islands of switches that are throttled between them, no matter how you lay it out.
We experienced the same thing, although we were running infiniband for the backend. Our cluster outgrew the infiniband switch, so we got another and just dealt with it until we were able to procure a larger one.
Naval Postgraduate School
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Alexander Oltu
Sent: Monday, July 08, 2013 6:42 AM
To: lustre-discuss at lists.lustre.org
Subject: [Lustre-discuss] LNET over 2 x 10 GbE switches
We are expanding our Lustre over 10GbE TCP setup. We are going to add few more OSSes and another 10GbE switch because we need more ports.
All OSSes and MDS have 2 x 10GbE interfaces in bonding-alb (same for clients).
For the new setup we have few choices to connect switches:
1. Just add a new switch move 2nd interfaces from all servers to the new switch and reconfigure all clients and servers to use arp ping.
(maybe we will need to switch bonding to balance-rr?, will test it).
The scary part is that the network noise will increase with increasing number of clients and servers, so I would prefer to keep MII monitoring.
2. Setup trunking on switches, connect them with 4 x 10GbE lines, move 2nd server interfaces to the new switch. In this case we can leave bonding-alb and mii monitoring.
For me it looks like 2nd option should be better way to go. I expect with the alb mode Lustre clients and servers will be able to go over the other switch in case connection to local switch is to busy. And should be able to keep fs online if one switch goes down.
Is there any recommended setup for Lnet how to connect switches?
Maybe someone already have experience?
The basic requirement is maximum throughput and to be able to have access to the filesystem in case one switch goes down.
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
More information about the lustre-discuss