[lustre-devel] [PATCH 26/27] lnet: Check if discovery toggled off in ping reply

James Simmons jsimmons at infradead.org
Sun Jun 13 16:11:36 PDT 2021


From: Chris Horn <chris.horn at hpe.com>

If a peer is initially discovered and found to have discovery
enabled, but the peer later reloads LNet with discovery disabled,
then we can delete the peer and re-create it the next time the peer
is discovered.

It is safe to delete and re-create the peer as long as it wasn't
configured manually.

In lnet_peer_deletion(), we need to use lnet_del_init() when removing
the peer from the discovery queue because the lnet_peer_del() code
path can result in a call to lnet_peer_queue_for_discovery() where
we check if the lp_dc_list is empty.

HPE-bug-id: LUS-9178
Fixes: 7ec94557b1 ("lnet: Prevent discovery on peer marked deletion")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14661
Lustre-commit: 143893381d428466 ("LU-14661 lnet: Check if discovery toggled off in ping reply")
Signed-off-by: Chris Horn <chris.horn at hpe.com>
Reviewed-on: https://review.whamcloud.com/43508
Reviewed-by: Serguei Smirnov <ssmirnov at whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko at hpe.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 net/lnet/lnet/peer.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 7630aff..2fc784d 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2254,22 +2254,34 @@ void lnet_peer_push_event(struct lnet_event *ev)
 	/* The peer may have discovery disabled at its end. Set
 	 * NO_DISCOVERY as appropriate.
 	 */
-	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY)) {
+	if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) ||
+	    lnet_peer_discovery_disabled) {
 		CDEBUG(D_NET, "Peer %s has discovery disabled\n",
 		       libcfs_nid2str(lp->lp_primary_nid));
-		/* Mark the peer for deletion if we already know about it
-		 * and it's going from discovery set to no discovery set
+
+		/* Detect whether this peer has toggled discovery from on to
+		 * off and whether we can delete and re-create the peer. Peers
+		 * that were manually configured cannot be deleted by discovery.
+		 * We need to delete this peer and re-create it if the peer was
+		 * not configured manually, is currently considered DD capable,
+		 * and either:
+		 * 1. We've already discovered the peer (the peer has toggled
+		 *    the discovery feature from on to off), or
+		 * 2. The peer is considered MR, but it was not user configured
+		 *    (this was a "temporary" peer created via the kernel APIs
+		 *     that we're discovering for the first time)
 		 */
-		if (!(lp->lp_state & (LNET_PEER_NO_DISCOVERY |
-				      LNET_PEER_DISCOVERING)) &&
-		    lp->lp_state & LNET_PEER_DISCOVERED) {
+		if (!(lp->lp_state & (LNET_PEER_CONFIGURED |
+				      LNET_PEER_NO_DISCOVERY)) &&
+		    (lp->lp_state & (LNET_PEER_DISCOVERED |
+				     LNET_PEER_MULTI_RAIL))) {
 			CDEBUG(D_NET, "Marking %s:0x%x for deletion\n",
 			       libcfs_nid2str(lp->lp_primary_nid),
 			       lp->lp_state);
 			lp->lp_state |= LNET_PEER_MARK_DELETION;
 		}
 		lp->lp_state |= LNET_PEER_NO_DISCOVERY;
-	} else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) {
+	} else {
 		CDEBUG(D_NET, "Peer %s has discovery enabled\n",
 		       libcfs_nid2str(lp->lp_primary_nid));
 		lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
@@ -3083,7 +3095,7 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	 * of deleting it.
 	 */
 	if (!list_empty(&lp->lp_dc_list))
-		list_del(&lp->lp_dc_list);
+		list_del_init(&lp->lp_dc_list);
 	list_for_each_entry_safe(route, tmp,
 				 &lp->lp_routes,
 				 lr_gwlist)
-- 
1.8.3.1



More information about the lustre-devel mailing list