[lustre-devel] [PATCH 320/622] lnet: fix cpt locking

James Simmons jsimmons at infradead.org
Thu Feb 27 13:13:08 PST 2020


From: Amir Shehata <ashehata at whamcloud.com>

In lnet_select_pathway() the call to lnet_handle_send_case_locked()
can result in sd_cpt being changed. If this function returns
REPEAT_SEND, we'll go back to the again label. It is possible at
this time to initiate discovery, which will unlock the cpt.
If the local cpt isn't updated we could potentially be manipulating
the wrong cpt resulting in some form of corruption or dead lock.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12163
Lustre-commit: f6d63067e1ec ("LU-12163 lnet: fix cpt locking")
Signed-off-by: Amir Shehata <ashehata at whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34607
Reviewed-by: Olaf Weber <olaf.weber at hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson at ddn.com>
Reviewed-by: Chris Horn <hornc at cray.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 net/lnet/lnet/lib-move.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 8eeb5ec..0ee3a55 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -2390,10 +2390,15 @@ struct lnet_ni *
 
 	rc = lnet_handle_send_case_locked(&send_data);
 
+	/* Update the local cpt since send_data.sd_cpt might've been
+	 * updated as a result of calling lnet_handle_send_case_locked().
+	 */
+	cpt = send_data.sd_cpt;
+
 	if (rc == REPEAT_SEND)
 		goto again;
 
-	lnet_net_unlock(send_data.sd_cpt);
+	lnet_net_unlock(cpt);
 
 	return rc;
 }
-- 
1.8.3.1



More information about the lustre-devel mailing list