[Lustre-discuss] Lustre routers capabilities

Sébastien Buisson sebastien.buisson at bull.net
Thu Apr 10 09:20:00 PDT 2008


Hello Marc,

Thank you for this feedback. This is a very exhaustive description of 
how you set up routers at the LLNL.
Just one question however: according to you, a simple way to increase 
routing bandwidth is to add more Lustre routers, so that they are not 
the bottleneck in the cluster. But at the LLNL, how do you deal with 
Lustre routing configuration when you add new routers? I mean, how is 
the network load balanced between all routers? Is it done in a dynamic 
way that supports adding or removal of routers?

Sebastien.


D. Marc Stearman a écrit :
> Sebastien,
> 
> For the most part we try to match the bandwidth of the disks, to the  
> network, to the number of routers needed.  I will be at the Lustre  
> User Group meeting in Sonoma, CA at the end of this  month giving a  
> talk about Lustre at LLNL, including our network design, and router  
> usage, but here is a quick description.
> 
> We have a large federated ethernet core.  We then have edge switches  
> for each of our clusters that have links up to the core, and back  
> down to the routers or tcp-only clients.  In a typical situation, if  
> we think one file system can achieve 20 GB/s based on disk bandwidth,  
> we try to make sure that the filesystem cluster has 20 GB/s network  
> bandwith (1GigE, 10GigE, etc), and that the routers for the compute  
> cluster total up to 20 GB/s as well.  So we may have a server cluster  
> with servers having dual GigE links, and routers with 10 GigE links,  
> and we just try to match them up so the numbers are even.  Typically,  
> the routers in a cluster are the same node type as the compute  
> cluster, just populated with additional network hardware.
> 
> In the future, we will likely be building a router cluster that will  
> bridge our existing federated ethernet core to a large Infinband  
> network, but that is at least one year away.
> 
> Most of our routers are rather simple, the have one high speed  
> interconnect HCA (Quadrics, Mellanox IB), and one network card ( dual  
> GigE, or single 10 GigE).  I don't think we've hit any bus bandwidth  
> limitation, and I haven't seen any of them really pressed for CPU or  
> Memory.  We do make sure to turn of irq_affinity when we have a  
> single network interface (the 10 GigE routers), and we've had to tune  
> the buffers and credits on the routers to get better throughput.  We  
> have noticed a problem with serialization of checksum processing on a  
> single core (bz #14690).
> 
> The beauty of routers though, is that if you find that they are all  
> running at capacity, you can always add a couple more, and move the  
> bottleneck to the network or disks.  We find we are mostly slowed  
> down by the disks.
> 
> -Marc
> 
> ----
> D. Marc Stearman
> LC Lustre Administration Lead
> marc at llnl.gov
> 925.423.9670
> Pager: 1.888.203.0641
> 
> 
> 
> On Apr 10, 2008, at 1:06 AM, Sébastien Buisson wrote:
>> Let's consider that the internal bus of the machine is bigger  
>> enough so
>> that it will not be saturated. In that case, what will be the limiting
>> factor? memory? CPU?
>> I know that it depends on how many I/B cards are plugged in the  
>> machine,
>> but generally speaking, is the routing activity CPU or memory hungry?
>>
>> By the way, are there people on that list that have feedback about
>> Lustre routers sizing? For instance, I know that Lustre routers have
>> been set up at the LLNL. What is the throughput obtained via the
>> routers, compared to the raw bandwidth of the interconnect?
>>
>> Thanks,
>> Sebastien.
>>
>>
>> Brian J. Murrell a écrit :
>>> On Wed, 2008-04-09 at 19:07 +0200, Sébastien Buisson wrote:
>>>> I mean, if I
>>>> have an available bandwith of 100 on each side of a router, what  
>>>> will be
>>>> the max reachable bandwith from clients on one side of the router to
>>>> servers on the other side of the router? Is it 50? 80? 99? Is the
>>>> routing process CPU or memory hungry?
>>> While I can't answer these things specifically another important
>>> consideration is the bus architecture involved.  How many I/B  
>>> cards can
>>> you put on a bus before you saturate the bus?
>>>
>>> b.
>>>
>>>
>>>
>>> --------------------------------------------------------------------- 
>>> ---
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 



More information about the lustre-discuss mailing list