[lustre-devel] [PATCH 05/14] lnet: Router ping timeout with discovery disabled

James Simmons jsimmons at infradead.org
Mon May 3 17:10:07 PDT 2021

From: Chris Horn <chris.horn at hpe.com>

Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Fixes: dc80207e3a ("lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
WC-bug-id: https://jira.whamcloud.com/browse/LU-14206
Lustre-commit: 173d86c6e9a704a8 ("LU-14206 lnet: Router ping timeout with discovery disabled")
Signed-off-by: Chris Horn <chris.horn at hpe.com>
Reviewed-on: https://review.whamcloud.com/40923
Reviewed-by: Cyril Bordage <cbordage at whamcloud.com>
Reviewed-by: James Simmons <jsimmons at infradead.org>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
 net/lnet/lnet/router.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ae7582ca..e179997 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -495,11 +495,11 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	lp->lp_alive = lp->lp_dc_error == 0;
-	/* ping replies are being handled when discovery is disabled */
-	if (lnet_is_discovery_disabled_locked(lp))
-		return;
 	if (!lp->lp_dc_error) {
+		/* ping replies are being handled when discovery is disabled */
+		if (lnet_is_discovery_disabled_locked(lp))
+			return;
 		/* mark single-hop routes. If the remote net is not configured
 		 * on the gateway we assume this is intentional and we mark the
 		 * gateway as multi-hop

More information about the lustre-devel mailing list