[lustre-devel] [PATCH 382/622] lnet: fix peer ref counting
James Simmons
jsimmons at infradead.org
Thu Feb 27 13:14:10 PST 2020
From: Amir Shehata <ashehata at whamcloud.com>
Exit from the loop after peer ref count has been incremented
to avoid wrong ref count.
The code makes sure that a peer is queued for discovery at most
once if discovery is disabled. This is done to use discovery
as a standard ping for gateways which do not have discovery feature
or discovery is disabled.
WC-bug-id: https://jira.whamcloud.com/browse/LU-9971
Lustre-commit: dbcddb4824f0 ("LU-9971 lnet: fix peer ref counting")
Signed-off-by: Amir Shehata <ashehata at whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35446
Reviewed-by: Olaf Weber <olaf.weber at hpe.com>
Reviewed-by: Chris Horn <hornc at cray.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
net/lnet/lnet/peer.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index d167a37..e33dc0e 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2138,6 +2138,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
DEFINE_WAIT(wait);
struct lnet_peer *lp;
int rc = 0;
+ int count = 0;
again:
lnet_net_unlock(cpt);
@@ -2157,11 +2158,20 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
break;
if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING)
break;
+ /* Don't repeat discovery if discovery is disabled. This is
+ * done to ensure we can use discovery as a standard ping as
+ * well for backwards compatibility with routers which do not
+ * have discovery or have discovery disabled
+ */
+ if (lnet_is_discovery_disabled(lp) && count > 0)
+ break;
if (lp->lp_dc_error)
break;
if (lnet_peer_is_uptodate(lp))
break;
lnet_peer_queue_for_discovery(lp);
+ count++;
+ CDEBUG(D_NET, "Discovery attempt # %d\n", count);
/* If caller requested a non-blocking operation then
* return immediately. Once discovery is complete any
@@ -2178,15 +2188,6 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp)
lnet_peer_decref_locked(lp);
/* Peer may have changed */
lp = lpni->lpni_peer_net->lpn_peer;
-
- /* Wait for discovery to complete, but don't repeat if
- * discovery is disabled. This is done to ensure we can
- * use discovery as a standard ping as well for backwards
- * compatibility with routers which do not have discovery
- * or have discovery disabled
- */
- if (lnet_is_discovery_disabled(lp))
- break;
}
finish_wait(&lp->lp_dc_waitq, &wait);
--
1.8.3.1
More information about the lustre-devel
mailing list