[lustre-devel] [PATCH 23/42] lustre: ptlrpc: don't panic during reconnection
James Simmons
jsimmons at infradead.org
Mon Jan 23 15:00:36 PST 2023
From: Alexander Boyko <alexander.boyko at hpe.com>
ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks
(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
reconnect changes import state and flags
and second part
(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))
MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.
HPE-bug-id: LUS-10985
WC-bug-id: https://jira.whamcloud.com/browse/LU-16297
Lustre-commit: df31c4c0b39b88459 ("LU-16297 ptlrpc: don't panic during reconnection")
Signed-off-by: Alexander Boyko <alexander.boyko at hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev at hpe.com>
Reviewed-by: Mikhail Pershin <mpershin at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
fs/lustre/ptlrpc/niobuf.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c
index 670bfb0de02f..09f68157b883 100644
--- a/fs/lustre/ptlrpc/niobuf.c
+++ b/fs/lustre/ptlrpc/niobuf.c
@@ -579,13 +579,20 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
/**
* For enabled AT all request should have AT_SUPPORT in the
- * FULL import state when OBD_CONNECT_AT is set
+ * FULL import state when OBD_CONNECT_AT is set.
+ * This check has a race with ptlrpc_connect_import_locked()
+ * with low chance, don't panic, only report.
*/
- LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
- (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
- !(imp->imp_connect_data.ocd_connect_flags &
- OBD_CONNECT_AT));
-
+ if (!(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
+ (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
+ !(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT))) {
+ DEBUG_REQ(D_HA, request,
+ "Wrong state of import detected, AT=%d, imp=%d, msghdr=%d, conn=%d\n",
+ AT_OFF, imp->imp_state != LUSTRE_IMP_FULL,
+ (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT),
+ !(imp->imp_connect_data.ocd_connect_flags &
+ OBD_CONNECT_AT));
+ }
if (request->rq_resend)
lustre_msg_add_flags(request->rq_reqmsg, MSG_RESENT);
--
2.27.0
More information about the lustre-devel
mailing list