[lustre-devel] [PATCH 38/42] lnet: deadlock on LNet shutdown

James Simmons jsimmons at infradead.org
Mon Oct 5 17:06:17 PDT 2020


From: Serguei Smirnov <ssmirnov at whamcloud.com>

Release ln_api_mutex during LNet shutdown while waiting
for zombie LNI to allow other threads to read the LNet
state updated by the shutdown and fall through, avoiding
the deadlock

WC-bug-id: https://jira.whamcloud.com/browse/LU-12233
Lustre-commit: e0c445648a38fb ("LU-12233 lnet: deadlock on LNet shutdown")
Signed-off-by: Serguei Smirnov <ssmirnov at whamcloud.com>
Reviewed-on: https://review.whamcloud.com/39933
Reviewed-by: Chris Horn <chris.horn at hpe.com>
Reviewed-by: Amir Shehata <ashehata at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 net/lnet/lnet/api-ni.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index f678ae2..03473bf 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -2036,13 +2036,21 @@ static void lnet_push_target_fini(void)
 		}
 
 		if (!list_empty(&ni->ni_netlist)) {
+			/* Unlock mutex while waiting to allow other
+			 * threads to read the LNet state and fall through
+			 * to avoid deadlock
+			 */
 			lnet_net_unlock(LNET_LOCK_EX);
+			mutex_unlock(&the_lnet.ln_api_mutex);
+
 			++i;
 			if ((i & (-i)) == i) {
 				CDEBUG(D_WARNING, "Waiting for zombie LNI %s\n",
 				       libcfs_nid2str(ni->ni_nid));
 			}
 			schedule_timeout_uninterruptible(HZ);
+
+			mutex_lock(&the_lnet.ln_api_mutex);
 			lnet_net_lock(LNET_LOCK_EX);
 			continue;
 		}
-- 
1.8.3.1



More information about the lustre-devel mailing list