[Lustre-discuss] Lustre routers capabilities
D. Marc Stearman
marc at llnl.gov
Thu Apr 10 08:09:19 PDT 2008
Sebastien,
For the most part we try to match the bandwidth of the disks, to the
network, to the number of routers needed. I will be at the Lustre
User Group meeting in Sonoma, CA at the end of this month giving a
talk about Lustre at LLNL, including our network design, and router
usage, but here is a quick description.
We have a large federated ethernet core. We then have edge switches
for each of our clusters that have links up to the core, and back
down to the routers or tcp-only clients. In a typical situation, if
we think one file system can achieve 20 GB/s based on disk bandwidth,
we try to make sure that the filesystem cluster has 20 GB/s network
bandwith (1GigE, 10GigE, etc), and that the routers for the compute
cluster total up to 20 GB/s as well. So we may have a server cluster
with servers having dual GigE links, and routers with 10 GigE links,
and we just try to match them up so the numbers are even. Typically,
the routers in a cluster are the same node type as the compute
cluster, just populated with additional network hardware.
In the future, we will likely be building a router cluster that will
bridge our existing federated ethernet core to a large Infinband
network, but that is at least one year away.
Most of our routers are rather simple, the have one high speed
interconnect HCA (Quadrics, Mellanox IB), and one network card ( dual
GigE, or single 10 GigE). I don't think we've hit any bus bandwidth
limitation, and I haven't seen any of them really pressed for CPU or
Memory. We do make sure to turn of irq_affinity when we have a
single network interface (the 10 GigE routers), and we've had to tune
the buffers and credits on the routers to get better throughput. We
have noticed a problem with serialization of checksum processing on a
single core (bz #14690).
The beauty of routers though, is that if you find that they are all
running at capacity, you can always add a couple more, and move the
bottleneck to the network or disks. We find we are mostly slowed
down by the disks.
-Marc
----
D. Marc Stearman
LC Lustre Administration Lead
marc at llnl.gov
925.423.9670
Pager: 1.888.203.0641
On Apr 10, 2008, at 1:06 AM, Sébastien Buisson wrote:
> Let's consider that the internal bus of the machine is bigger
> enough so
> that it will not be saturated. In that case, what will be the limiting
> factor? memory? CPU?
> I know that it depends on how many I/B cards are plugged in the
> machine,
> but generally speaking, is the routing activity CPU or memory hungry?
>
> By the way, are there people on that list that have feedback about
> Lustre routers sizing? For instance, I know that Lustre routers have
> been set up at the LLNL. What is the throughput obtained via the
> routers, compared to the raw bandwidth of the interconnect?
>
> Thanks,
> Sebastien.
>
>
> Brian J. Murrell a écrit :
>> On Wed, 2008-04-09 at 19:07 +0200, Sébastien Buisson wrote:
>>> I mean, if I
>>> have an available bandwith of 100 on each side of a router, what
>>> will be
>>> the max reachable bandwith from clients on one side of the router to
>>> servers on the other side of the router? Is it 50? 80? 99? Is the
>>> routing process CPU or memory hungry?
>>
>> While I can't answer these things specifically another important
>> consideration is the bus architecture involved. How many I/B
>> cards can
>> you put on a bus before you saturate the bus?
>>
>> b.
>>
>>
>>
>> ---------------------------------------------------------------------
>> ---
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list