[Lustre-devel] Lnet routing preferences
D. Marc Stearman
marc at llnl.gov
Wed Aug 4 08:41:43 PDT 2010
Right, so you would want to create your file system (or modify via
tunefs.lustre) to put the management network's NIDs as the first --
params. I don't think it will avoid all traffic going over the o2ib0,
but perhaps minimize it.
-Marc
----
D. Marc Stearman
Lustre Operations Lead
marc at llnl.gov
925.423.9670
Pager: 1.888.203.0641
On Aug 4, 2010, at 8:39 AM, Ben Evans wrote:
> Yes, this is exactly what I'm looking at. From the hints that Eric
> provided, and from my digging, it looks like there is a quick check to
> see which connection has the shortest queue (along with the number of
> hops and a few other things) and uses that one. If they're equal it
> prefers the first connection in the list.
>
> -----Original Message-----
> From: D. Marc Stearman [mailto:marc at llnl.gov]
> Sent: Wednesday, August 04, 2010 11:25 AM
> To: Eric Barton
> Cc: Ben Evans; 'lustre-devel'
> Subject: Re: [Lustre-devel] Lnet routing preferences
>
> I think what Ben is trying to say is something like this:
>
> You have a small gigabit management network for your server cluster,
> say tcp0 that would be used just for server to server communication.
> ie precreate requests from the MDS to the OSS nodes. You want all of
> your clients to mount and pass data over your o2ib0 network.
> Presumably you create your file system with NIDs on both tcp0 and
> o2ib0. Clients would mount using mdsnid at o2ib0:/fsname which would
> force the client traffic to use the IB network since that is all they
> are connected to. How does LNET decide which network, tcp0 or o2ib0,
> to communicate for server traffic. My understanding is that
> connections will be setup on both networks since the servers have NIDS
> on both, so does LNET use the local network with the shortest queue,
> or does it round robin between them?
>
> -Marc
>
> ----
> D. Marc Stearman
> Lustre Operations Lead
> marc at llnl.gov
> 925.423.9670
> Pager: 1.888.203.0641
>
>
>
>
> On Aug 3, 2010, at 10:30 PM, Eric Barton wrote:
>
>> Ben,
>>
>>> From: lustre-devel-bounces at lists.lustre.org
> [mailto:lustre-devel-bounces at lists.lustre.org
>>> ] On Behalf Of Ben Evans
>>> Sent: 27 July 2010 11:04 AM
>>>
>>> I've been poking around and experimenting with the luster internals
>>> on my own, and ran into a question that I haven't been able to track
>>> down.
>>>
>>
>>> For MDS/OSS communications, where there are multiple possible paths
>>> (Ethernet, IB, etc.) how does LNET (or Lustre) decide which
>>> interface to send messages?
>>
>> First a bit of explanation...
>>
>> LNET node addressing is driven by the idea that since an arbitrary
>> network topology requires O(n**2) routing tables, it would be good to
>> limit the 'n' as much as possible :-)
>>
>> When Peter Braam and I were discussing how to finesse this issue in
>> early implementations of LNET routing, we observed that since Lustre
>> is a cluster file system spanning compute clusters, storage clusters
>> and mixtures of both, a 2-level addressing scheme which assumes flat
>> connectivity within clusters but arbitrary connectivity between
>> clusters, limits the 'n' to the number of clusters rather than the
>> total number of nodes. That's why an LNET NID is the concatenation
>> of
>> the network and the node-within-network.
>>
>> Now to your question...
>>
>> When LNET routes a message, it first checks whether the destination
>> NID is in a local network. If so, it passes the message to the local
>> interface on that network.
>>
>> If the destination is not local, LNET looks up the destination
>> network
>> in its route table. The route table lists all the NIDs of LNET
>> routers on local networks that could forward the message to its
>> eventual destination with a minimum number of hops. LNET then
>> chooses
>> the router with the shortest queue.
>>
>>> Ideally, I'd like to send server-to-server messages over a private
>>> network and let the clients communicate over the public network
>>
>> Note that the choice of destination NID is in itself a routing
>> decision if there are potentially several to choose from. For
>> example, if I have NIDs x1 at o2ib0 and y1 at tcp0 and you have NIDs
>> x2 at 02ib0 and y2 at o2ib0, then whether communications between us are
>> routed over o2ib0 or tcp0 is completely determined by the choice of
>> NID handed to LNET, not by LNET itself.
>>
>> So if you want to communicate over a server-only network, you just
>> need to use server-only NIDs.
>>
>> Note however that this requirement may conflict with the desire to do
>> link aggregation for performance/failover. We've been considering
>> using NIDs in a way which is much more like conventional IP
>> networks -
>> i.e. where the upper levels can specify any destination NID and LNET
>> takes a bigger part in the decision about which network to use.
>>
>> Isaac Huang has been thinking about link aggregation for a while and
>> may care to comment on whether he has considered private networks
>> like
>> this.
>>
>>> I'm interested in finding out if there are any gains to be made from
>>> a setup like this.
>>
>> Yes, you could benefit from avoiding any congestion created by client
>> communications.
>>
>> But I must ask - what is it that you want to communicate between
>> servers like this and are you sure you're not introducing a scaling
>> or
>> deadlock issue?
>>
>> Cheers,
>> Eric
>>
>> Eric Barton
>> CTO Whamcloud Inc.
>>
>>
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://**lists.lustre.org/mailman/listinfo/lustre-devel
>>
>
>
More information about the lustre-devel
mailing list