[lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection

James Simmons jsimmons at infradead.org
Mon Jan 23 15:00:36 PST 2023


From: Alexander Boyko <alexander.boyko at hpe.com>

ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks

(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||

reconnect changes import state and flags
and second part

(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))

MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.

HPE-bug-id: LUS-10985
WC-bug-id: https://jira.whamcloud.com/browse/LU-16297
Lustre-commit: df31c4c0b39b88459 ("LU-16297 ptlrpc: don't panic during reconnection")
Signed-off-by: Alexander Boyko <alexander.boyko at hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev at hpe.com>
Reviewed-by: Mikhail Pershin <mpershin at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 fs/lustre/ptlrpc/niobuf.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 670bfb0de02f..09f68157b883 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -579,13 +579,20 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
 
 	/**
 	 * For enabled AT all request should have AT_SUPPORT in the
-	 * FULL import state when OBD_CONNECT_AT is set
+	 * FULL import state when OBD_CONNECT_AT is set.
+	 * This check has a race with ptlrpc_connect_import_locked()
+	 * with low chance, don't panic, only report.
 	 */
-	LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
-		(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
-		!(imp->imp_connect_data.ocd_connect_flags &
-		OBD_CONNECT_AT));
-
+	if (!(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
+	    (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
+	    !(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT))) {
+		DEBUG_REQ(D_HA, request,
+			  "Wrong state of import detected, AT=%d, imp=%d, msghdr=%d, conn=%d\n",
+			  AT_OFF, imp->imp_state != LUSTRE_IMP_FULL,
+			  (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT),
+			  !(imp->imp_connect_data.ocd_connect_flags &
+			    OBD_CONNECT_AT));
+	}
 	if (request->rq_resend)
 		lustre_msg_add_flags(request->rq_reqmsg, MSG_RESENT);
 
-- 
2.27.0



More information about the lustre-devel mailing list