[lustre-devel] [PATCH 18/24] lnet: Correct net selection for router ping
James Simmons
jsimmons at infradead.org
Mon Sep 5 18:55:31 PDT 2022
From: Chris Horn <chris.horn at hpe.com>
lnet_find_best_ni_on_local_net() contains logic for restricting
the NI selection to a net specified by lnet_peer::lp_disc_net_id. The
purpose of this is to ensure that LNet peers ping every interface on
a router at a regular interval as part of the LNet router health
feature. However, this logic is flawed because lnet_msg_discovery()
is used to determine whether the message being sent is a discovery
message, but that function actually determines whether a given message
can _trigger_ discovery.
Introduce a new function, lnet_msg_is_ping(), which determines whether
a given lnet_msg is a GET on the LNET_RESERVED_PORTAL.
Modify lnet_find_best_ni_on_local_net() to restrict NI selection to
lp_disc_net_id iff:
1. lp_disc_net_id is non-zero
2. The peer has the LNET_PEER_RTR_DISCOVERY flag set.
3. lnet_msg_is_ping() returns true
HPE-bug-id: LUS-11017
WC-bug-id: https://jira.whamcloud.com/browse/LU-15929
Lustre-commit: 2431e099b143a4c7e ("LU-15929 lnet: Correct net selection for router ping")
Signed-off-by: Chris Horn <chris.horn at hpe.com>
Reviewed-on: https://review.whamcloud.com/47527
Reviewed-by: Frank Sehr <fsehr at whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
net/lnet/lnet/lib-move.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index ec8be8f..3c9602e 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1577,7 +1577,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
return false;
}
-/*
+/* Can the specified message trigger peer discovery?
+ *
* Traffic to the LNET_RESERVED_PORTAL may not trigger peer discovery,
* because such traffic is required to perform discovery. We therefore
* exclude all GET and PUT on that portal. We also exclude all ACK and
@@ -1591,6 +1592,18 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
return !(lnet_reserved_msg(msg) || lnet_msg_is_response(msg));
}
+/* Is the specified message an LNet ping?
+ */
+static bool
+lnet_msg_is_ping(struct lnet_msg *msg)
+{
+ if (msg->msg_type == LNET_MSG_GET &&
+ msg->msg_hdr.msg.get.ptl_index == LNET_RESERVED_PORTAL)
+ return true;
+
+ return false;
+}
+
#define SRC_SPEC 0x0001
#define SRC_ANY 0x0002
#define LOCAL_DST 0x0004
@@ -2228,10 +2241,14 @@ struct lnet_ni *
u32 best_net_sel_prio = LNET_MAX_SELECTION_PRIORITY;
u32 net_sel_prio;
- /* if this is a discovery message and lp_disc_net_id is
- * specified then use that net to send the discovery on.
+ /* If lp_disc_net_id is set, this peer is a router undergoing
+ * discovery, and this message is an LNet ping, then this may be a
+ * discovery message and we need to select an NI on the peer net
+ * specified by lp_disc_net_id
*/
- if (discovery && peer->lp_disc_net_id) {
+ if (peer->lp_disc_net_id &&
+ (peer->lp_state & LNET_PEER_RTR_DISCOVERY) &&
+ lnet_msg_is_ping(msg)) {
best_lpn = lnet_peer_get_net_locked(peer, peer->lp_disc_net_id);
if (best_lpn && lnet_get_net_locked(best_lpn->lpn_net_id))
goto select_best_ni;
--
1.8.3.1
More information about the lustre-devel
mailing list