[lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check

James Simmons jsimmons at infradead.org
Mon Aug 2 12:50:28 PDT 2021


From: Chris Horn <chris.horn at hpe.com>

Discovery thread can modify peer NI/peer net/peer relationship
so we need to be careful when dereferencing the peer NI pointer in
lnet_health_check(). Discovery thread operations under net lock, so
move the peer NI dereference under the net lock which is taken for
incrementing the health stats.

Move some of the other code that is only relevant for messages with a
health status != LNET_MSG_STATUS_OK under the appropriate condition.

HPE-bug-id: LUS-9962
WC-bug-id: https://jira.whamcloud.com/browse/LU-14655
Lustre-commit: d87af24452a2e883 ("LU-14655 lnet: Protect lpni deref in lnet_health_check")
Signed-off-by: Chris Horn <chris.horn at hpe.com>
Reviewed-on: https://review.whamcloud.com/43503
Reviewed-by: Alexander Boyko <alexander.boyko at hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 net/lnet/lnet/lib-msg.c | 71 ++++++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 580ddf6..e471848 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -821,38 +821,6 @@
 		attempt_remote_resend = false;
 	}
 
-	/* Don't further decrement the health value if a recovery message
-	 * failed.
-	 */
-	if (msg->msg_recovery) {
-		handle_local_health = false;
-		handle_remote_health = false;
-	} else {
-		handle_local_health = false;
-		handle_remote_health = true;
-	}
-
-	/* For local failures, health/recovery/resends are not needed if I only
-	 * have a single (non-lolnd) interface. NB: pb_nnis includes the lolnd
-	 * interface, so a single-rail node would have pb_nnis == 2.
-	 */
-	if (the_lnet.ln_ping_target->pb_nnis <= 2) {
-		handle_local_health = false;
-		attempt_local_resend = false;
-	}
-
-	/* For remote failures, health/recovery/resends are not needed if the
-	 * peer only has a single interface. Special case for routers where we
-	 * rely on health feature to manage route aliveness. NB: unlike pb_nnis
-	 * above, lp_nnis does _not_ include the lolnd, so a single-rail node
-	 * would have lp_nnis == 1.
-	 */
-	if (lpni && lpni->lpni_peer_net->lpn_peer->lp_nnis <= 1) {
-		attempt_remote_resend = false;
-		if (!lnet_isrouter(lpni))
-			handle_remote_health = false;
-	}
-
 	if (!lo)
 		LASSERT(ni && lpni);
 	else
@@ -865,11 +833,48 @@
 	       lnet_health_error2str(hstatus));
 
 	/* stats are only incremented for errors so avoid wasting time
-	 * incrementing statistics if there is no error.
+	 * incrementing statistics if there is no error. Similarly, whether to
+	 * update health values or perform resends is only applicable for
+	 * messages with a health status != OK.
 	 */
 	if (hstatus != LNET_MSG_STATUS_OK) {
+		/* Don't further decrement the health value if a recovery
+		 * message failed.
+		 */
+		if (msg->msg_recovery) {
+			handle_local_health = false;
+			handle_remote_health = false;
+		} else {
+			handle_local_health = true;
+			handle_remote_health = true;
+		}
+
+		/* For local failures, health/recovery/resends are not needed if
+		 * I only have a single (non-lolnd) interface. NB: pb_nnis
+		 * includes the lolnd interface, so a single-rail node would
+		 * have pb_nnis == 2.
+		 */
+		if (the_lnet.ln_ping_target->pb_nnis <= 2) {
+			handle_local_health = false;
+			attempt_local_resend = false;
+		}
+
 		lnet_net_lock(0);
 		lnet_incr_hstats(ni, lpni, hstatus);
+		/* For remote failures, health/recovery/resends are not needed
+		 * if the peer only has a single interface. Special case for
+		 * routers where we rely on health feature to manage route
+		 * aliveness. NB: unlike pb_nnis above, lp_nnis does _not_
+		 * include the lolnd, so a single-rail node would have
+		 * lp_nnis == 1.
+		 */
+		if (lpni && lpni->lpni_peer_net &&
+		    lpni->lpni_peer_net->lpn_peer &&
+		    lpni->lpni_peer_net->lpn_peer->lp_nnis <= 1) {
+			attempt_remote_resend = false;
+			if (!lnet_isrouter(lpni))
+				handle_remote_health = false;
+		}
 		lnet_net_unlock(0);
 	}
 
-- 
1.8.3.1



More information about the lustre-devel mailing list