[lustre-discuss] LNET Routing Question

John Fragalla jfragalla at cray.com
Sun May 13 20:02:02 PDT 2018


Makia,

When I tested this similar configuration using routed and non-routed 
clients, what I found out is all clients, routed and non-routed, should 
have the same lnet.conf parameters defined, minus the specific ip2nets 
and routes params.  When non-routed clients did not have the same 
settings as routed clients, I found out during my testing the IO would 
hang when failing a router node, which effected the non-routed clients.  
But when all clients had the same parameters, and did the same exact 
test, IO would continue for both routed and non-routed.  I even tested 
bringing down all router nodes, the non-routed clients were not effected.

The other aspect I tested was 1MB and 4MB RPC, for all clients, which 
all have to be the same, but I have not tested 16MB RPC yet. I also had 
the same client tuning across all clients, routed and non-routed.

I suggest you test this configuration on a small setup, if you can, to 
verify before production use.

Thanks.

jnf

--
John Fragalla
Senior Storage Engineer
High Performance Computing
Cray Inc.
jfragalla at cray.com <mailto:jfragalla at cray.com>
+1-951-258-7629

On 5/9/18 6:50 AM, Makia Minich wrote:
> Hello all,
>
> I have an LNET routing question. I’ve attached a quick diagram of the 
> current setup; but basically I have two core networks (one infiniband 
> and one ethernet) with a set of LNET routers in between. There is 
> storage and clients on both sides of these routers and all clients 
> need to see all/most storage. All connections, configurations, etc are 
> all working.
>
> The question is, if an LNET router goes down (which does cause some 
> amount of reconnect or remapping for any clients attempting to use 
> those routes) would this cause any issues or delays for a client’s 
> connection to non-routed storage? Put slightly different, if a job on 
> the ethernet clients is actively using ethernet storage and the lnet 
> routers go down, will job be affected? What about a new job just 
> launching when that lnet router is down?
>
> In addition, what does “check_routers_before_use” actually do and does 
> it change the scenarios I mentioned? (e.g. If an ethernet client has 
> “check_routers_before_use” would every file request start with a ping 
> to the routers even if it’s not leaving it’s core network?)
>
> Thanks!
>
>
>>
> Makia Minich
> Principal Architect
> System Fabric Works
> "Fabric Computing that Works”
>
> "Oh, I don't know. I think everything is just as it should be, y'know?”
> - Frank Fairfield
>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180513/7f2b035e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_routing.png
Type: image/png
Size: 22842 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180513/7f2b035e/attachment-0001.png>


More information about the lustre-discuss mailing list