[Lustre-discuss] Lustre routers capabilities

Thu Apr 10 08:09:19 PDT 2008

Sebastien,

For the most part we try to match the bandwidth of the disks, to the  
network, to the number of routers needed.  I will be at the Lustre  
User Group meeting in Sonoma, CA at the end of this  month giving a  
talk about Lustre at LLNL, including our network design, and router  
usage, but here is a quick description.

We have a large federated ethernet core.  We then have edge switches  
for each of our clusters that have links up to the core, and back  
down to the routers or tcp-only clients.  In a typical situation, if  
we think one file system can achieve 20 GB/s based on disk bandwidth,  
we try to make sure that the filesystem cluster has 20 GB/s network  
bandwith (1GigE, 10GigE, etc), and that the routers for the compute  
cluster total up to 20 GB/s as well.  So we may have a server cluster  
with servers having dual GigE links, and routers with 10 GigE links,  
and we just try to match them up so the numbers are even.  Typically,  
the routers in a cluster are the same node type as the compute  
cluster, just populated with additional network hardware.

In the future, we will likely be building a router cluster that will  
bridge our existing federated ethernet core to a large Infinband  
network, but that is at least one year away.

Most of our routers are rather simple, the have one high speed  
interconnect HCA (Quadrics, Mellanox IB), and one network card ( dual  
GigE, or single 10 GigE).  I don't think we've hit any bus bandwidth  
limitation, and I haven't seen any of them really pressed for CPU or  
Memory.  We do make sure to turn of irq_affinity when we have a  
single network interface (the 10 GigE routers), and we've had to tune  
the buffers and credits on the routers to get better throughput.  We  
have noticed a problem with serialization of checksum processing on a  
single core (bz #14690).

The beauty of routers though, is that if you find that they are all  
running at capacity, you can always add a couple more, and move the  
bottleneck to the network or disks.  We find we are mostly slowed  
down by the disks.

-Marc

----
D. Marc Stearman
LC Lustre Administration Lead
marc at llnl.gov
925.423.9670
Pager: 1.888.203.0641

On Apr 10, 2008, at 1:06 AM, Sébastien Buisson wrote:
> Let's consider that the internal bus of the machine is bigger  
> enough so
> that it will not be saturated. In that case, what will be the limiting
> factor? memory? CPU?
> I know that it depends on how many I/B cards are plugged in the  
> machine,
> but generally speaking, is the routing activity CPU or memory hungry?
>
> By the way, are there people on that list that have feedback about
> Lustre routers sizing? For instance, I know that Lustre routers have
> been set up at the LLNL. What is the throughput obtained via the
> routers, compared to the raw bandwidth of the interconnect?
>
> Thanks,
> Sebastien.
>
>
> Brian J. Murrell a écrit :
>> On Wed, 2008-04-09 at 19:07 +0200, Sébastien Buisson wrote:
>>> I mean, if I
>>> have an available bandwith of 100 on each side of a router, what  
>>> will be
>>> the max reachable bandwith from clients on one side of the router to
>>> servers on the other side of the router? Is it 50? 80? 99? Is the
>>> routing process CPU or memory hungry?
>>
>> While I can't answer these things specifically another important
>> consideration is the bus architecture involved.  How many I/B  
>> cards can
>> you put on a bus before you saturate the bus?
>>
>> b.
>>
>>
>>
>> --------------------------------------------------------------------- 
>> ---
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss