[lustre-discuss] Routers and shortest path

Chris Horn hornc at cray.com
Fri Oct 13 12:54:43 PDT 2017


I think the only way to do this today is to assign the clients in each “islet” a unique LNet. What problems did that cause for you (besides the administrative headache?)

Chris Horn

On 10/13/17, 9:51 AM, "lustre-discuss on behalf of LOPEZ, ALEXANDRE" <lustre-discuss-bounces at lists.lustre.org on behalf of alexandre.lopez at atos.net> wrote:

    Hi Sebastien.
    
    It is in fact an asymmetric routing problem. But the way routes are declared today in Lustre makes it quite difficult to avoid in this particular context.
    
    I was considering the possibility to add a flag, a special route, whatever, to force LNet to return the response to the same router the request arrived from. Nevertheless, since I started to look at Lustre's code today for the very first time, it will take quite some time before I get something useful. I don't even know if this is actually possible. If that ever happens, I'll be glad to contribute it.
    
    Cheers,
    Alejandro
    
    -----Original Message-----
    From: Sebastien Buisson [mailto:sbuisson at ddn.com] 
    Sent: Friday, October 13, 2017 3:42 PM
    To: LOPEZ, ALEXANDRE
    Cc: Lustre Discuss (lustre-discuss at lists.lustre.org)
    Subject: Re: [lustre-discuss] Routers and shortest path
    
    Hi Alejandro!
    
    This makes me think of an asymmetric routing problem. It could be addressed by implementing something like reverse path filtering (http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html) in LNet: nodes would not accept requests from peers through router B when they are configured to talk to those peers through router A only.
    
    If there is no other ready for use solution and you are willing to contribute code :)
    
    Cheers,
    Sebastien.
    
    > Le 13 oct. 2017 à 15:20, LOPEZ, ALEXANDRE <alexandre.lopez at atos.net> a écrit :
    > 
    > Hi everyone,
    >  
    > I’d like to have your opinion on a problem I’m facing. Sorry for the long mail but I failed to make it shorter without removing some important information.
    >  
    > Each islet on my cluster has a dedicated Lustre router connected to the interconnect and to a dedicated network where Lustre servers are reachable. Lustre servers are NOT on the main interconnect, thus the need for routers. Any router is reachable thru the interconnect from any node but, when the node and the router aren’t on the same islet, several switches (hops) need to be crossed. The idea is to use the shortest path to the servers thru the islet-local router.
    >  
    > I created the appropriate routes on each compute node to contact the islet-local Lustre router. There is also a lower-priority route to fail over a router on another islet in case the local Lustre router fails. (This could have also been done with the route’s hops, but my understanding is that the final result is the same.) I also created the routes on the Lustre servers for the responses to reach the clients thru the routes.
    >  
    > This seems to work as expected, but this is actually false.
    >  
    > Although the filesystem is mounted on the clients and works, there is a problem when there is no failure (all routers are up). The problem roots in the routes used to deliver the responses from the servers. If I assign priorities to the routes on the servers, the higher priority route will always be used to send the responses. So, if a compute node sent a request thru its islet’s router (the shortest path), the response will not return thru the same router but thru the one designated by the higher priority route, making the return path longer. Using hops is the same thing: the route with the lower hop value is chosen, but the same set of routes apply to all the nodes on all the islets and a valid value for an islet is not valid for all the others. If I assign neither priority nor hops, round-robin will be used and the next route on the list is selected.
    >  
    > The ideal solution would be for the response to follow the reverse path followed by the request (thru the same router) but I found no way to do it.
    >  
    > Is there any way to make the responses go the reverse (shortest) path?
    >  
    > Any other way to solve this?
    >  
    > I considered assigning a separate Lustre network to each islet but, although this solves this problem, it adds new ones; so I ended up discarding it.
    >  
    > I’m currently using Lustre 2.7 but I found nothing suggesting that 2.10 will solve the problem.
    >  
    > Thanks for your time and answers.
    >  
    > Alexandre Lopez
    > Big Data & Security – Data Management
    > Bull SAS – Atos Technologies
    >  
    >  
    >  
    > _______________________________________________
    > lustre-discuss mailing list
    > lustre-discuss at lists.lustre.org
    > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    



More information about the lustre-discuss mailing list