[lustre-discuss] lustre-discuss Digest, Vol 228, Issue 3

Berry-Lozano, Erica erica.berry-lozano at hpe.com
Wed Mar 5 20:42:18 PST 2025


Please remove me from this email distribution list.  Thanks

-----Original Message-----
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> On Behalf Of lustre-discuss-request at lists.lustre.org
Sent: Wednesday, March 5, 2025 2:05 PM
To: lustre-discuss at lists.lustre.org
Subject: lustre-discuss Digest, Vol 228, Issue 3

Send lustre-discuss mailing list submissions to
	lustre-discuss at lists.lustre.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!NpxR!nW3d5kUkGUumTD2D9jXSF5CvA3MWMg1Ye4tGQs4BwUrstkBSP9l5HNq08rXZwbINHfO2eMRHdwzqA7IXwpZAmNEC1W0u3ef_v-URSg$
or, via email, send a message with subject or body 'help' to
	lustre-discuss-request at lists.lustre.org

You can reach the person managing the list at
	lustre-discuss-owner at lists.lustre.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of lustre-discuss digest..."


Today's Topics:

   1. multi-hop routing (John White)
   2. Re: multi-hop routing (Horn, Chris)


----------------------------------------------------------------------

Message: 1
Date: Wed, 5 Mar 2025 11:14:49 -0800
From: John White <jwhite at lbl.gov>
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] multi-hop routing
Message-ID: <32AB4536-9924-41B8-8226-B3E3BF19F77E at lbl.gov>
Content-Type: text/plain;	charset=utf-8

Hello folks.  I have a rare situation that I?m told some centers are successfully pulling off and am looking for guidance - multi-hop lnet routing.
In short, I have 2 distinct o2ib fabrics at disparate geo sites joined by a routed ethernet fabric.  I?m looking to use a 2-lnet-router chain to plumb the two o2ib fabrics together.

servers on the left, clients on the right
o2ib0(10.5.0.0/16) <-> router(o2ib0,tcp0) <-> routed eth (10.37.0.0/16, 10.38.0.0/16) <-> router(tcp0,o2ib2) <-> o2ib2(10.6.0.0/16)

I have both sets of routers up but traffic absolutely fails the 2nd hop in either direction (I can `lctl ping` tcp0 from o2ib2 and o2ib0 but no further).

I?ve tried adding a route ON the routers, that didn?t help. 

I?ve tried defining the 2nd hop on the client:
options lnet routes="tcp0 10.6.0.[250-251]@o2ib2;\
o2ib0 10.37.250.[162-163]@tcp0?

but that failed with the following kern message on lnet load:
74067:0:(router.c:644:lnet_add_route()) Cannot add route with gateway 10.37.250.162 at tcp. There is no local interface configured on LNet tcp

Does anyone have any hints here?  It feels like I?m a syntax change or a routing hint away from getting this working.

------------------------------

Message: 2
Date: Wed, 5 Mar 2025 20:05:02 +0000
From: "Horn, Chris" <chris.horn at hpe.com>
To: John White <jwhite at lbl.gov>, "lustre-discuss at lists.lustre.org"
	<lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] multi-hop routing
Message-ID:
	<PH7PR84MB1438479204DBEC1E027B8FC79ECB2 at PH7PR84MB1438.NAMPRD84.PROD.OUTLOOK.COM>
	
Content-Type: text/plain; charset="utf-8"

You need LNet routes configured on all nodes. It should look something like this:

# pdsh -w n0[0-3] 'lctl list_nids; lctl show_route' | dshbak -c
----------------
server
----------------
172.18.2.5 at o2ib<mailto:172.18.2.5 at o2ib>
net              o2ib2 hops 2 gw                  172.18.2.6 at o2ib<mailto:172.18.2.6 at o2ib> up pri 0
----------------
router1
----------------
172.18.2.6 at o2ib<mailto:172.18.2.6 at o2ib>
172.18.2.2 at tcp<mailto:172.18.2.2 at tcp>
net              o2ib2 hops 1 gw                   172.18.2.3 at tcp<mailto:172.18.2.3 at tcp> up pri 0
----------------
router2
----------------
172.18.2.7 at o2ib2<mailto:172.18.2.7 at o2ib2>
172.18.2.3 at tcp<mailto:172.18.2.3 at tcp>
net               o2ib hops 1 gw                   172.18.2.2 at tcp<mailto:172.18.2.2 at tcp> up pri 0
----------------
client
----------------
172.18.2.8 at o2ib2<mailto:172.18.2.8 at o2ib2>
net               o2ib hops 2 gw                 172.18.2.7 at o2ib2<mailto:172.18.2.7 at o2ib2> up pri 0
#

Chris Horn

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of John White via lustre-discuss <lustre-discuss at lists.lustre.org>
Date: Wednesday, March 5, 2025 at 1:17?PM
To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] multi-hop routing Hello folks.  I have a rare situation that I?m told some centers are successfully pulling off and am looking for guidance - multi-hop lnet routing.
In short, I have 2 distinct o2ib fabrics at disparate geo sites joined by a routed ethernet fabric.  I?m looking to use a 2-lnet-router chain to plumb the two o2ib fabrics together.

servers on the left, clients on the right
o2ib0(10.5.0.0/16) <-> router(o2ib0,tcp0) <-> routed eth (10.37.0.0/16, 10.38.0.0/16) <-> router(tcp0,o2ib2) <-> o2ib2(10.6.0.0/16)

I have both sets of routers up but traffic absolutely fails the 2nd hop in either direction (I can `lctl ping` tcp0 from o2ib2 and o2ib0 but no further).

I?ve tried adding a route ON the routers, that didn?t help.

I?ve tried defining the 2nd hop on the client:
options lnet routes="tcp0 10.6.0.[250-251]@o2ib2;\
o2ib0 10.37.250.[162-163]@tcp0?

but that failed with the following kern message on lnet load:
74067:0:(router.c:644:lnet_add_route()) Cannot add route with gateway 10.37.250.162 at tcp. There is no local interface configured on LNet tcp

Does anyone have any hints here?  It feels like I?m a syntax change or a routing hint away from getting this working.
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!NpxR!keuGPb7MHd7CQc6Zi_uwIvFahK68FJfbq9MNIXgHpd0W8bi5vOYFHf-IixYY5DiOnJKx0z9-Ht8VqH1ew82XWtaTRaoq$<https://urldefense.com/v3/__http:/lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!NpxR!keuGPb7MHd7CQc6Zi_uwIvFahK68FJfbq9MNIXgHpd0W8bi5vOYFHf-IixYY5DiOnJKx0z9-Ht8VqH1ew82XWtaTRaoq$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://urldefense.com/v3/__http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250305/bfa13970/attachment.htm__;!!NpxR!nW3d5kUkGUumTD2D9jXSF5CvA3MWMg1Ye4tGQs4BwUrstkBSP9l5HNq08rXZwbINHfO2eMRHdwzqA7IXwpZAmNEC1W0u3efbP49wvQ$ >

------------------------------

Subject: Digest Footer

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!NpxR!nW3d5kUkGUumTD2D9jXSF5CvA3MWMg1Ye4tGQs4BwUrstkBSP9l5HNq08rXZwbINHfO2eMRHdwzqA7IXwpZAmNEC1W0u3ef_v-URSg$ 


------------------------------

End of lustre-discuss Digest, Vol 228, Issue 3
**********************************************


More information about the lustre-discuss mailing list