[lustre-devel] [PATCH 277/622] lustre: ptlrpc: ASSERTION (req_transno < next_transno) failed

James Simmons jsimmons at infradead.org
Thu Feb 27 13:12:25 PST 2020


From: Andriy Skulysh <c17819 at cray.com>

An update request is checked for duplicates by xid in
is_req_replayed_by_update(). However xid is unique per
client only. It may happen that there are 2 requests
with the same xid from different clients.

Perform lookup by transno, it is unique per MDT.

Cray-bug-id: LUS-6015
WC-bug-id: https://jira.whamcloud.com/browse/LU-11251
Lustre-commit: 53764826b95f ("LU-11251 mdt: ASSERTION (req_transno < next_transno) failed")
Signed-off-by: Andriy Skulysh <c17819 at cray.com>
Reviewed-by: Vitaly Fertman <c17818 at cray.com>
Reviewed-by: Alexander Boyko <c17825 at cray.com>
Reviewed-on: https://review.whamcloud.com/33001
Reviewed-by: Alexandr Boyko <c17825 at cray.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 fs/lustre/include/obd_support.h |  3 ++-
 fs/lustre/ptlrpc/client.c       | 11 ++++++++---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 4e956da..837b68d 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -355,7 +355,8 @@
 #define OBD_FAIL_PTLRPC_DROP_BULK			0x51a
 #define OBD_FAIL_PTLRPC_LONG_REQ_UNLINK			0x51b
 #define OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK		0x51c
-#define OBD_FAIL_PTLRPC_BULK_ATTACH      0x521
+#define OBD_FAIL_PTLRPC_BULK_ATTACH			0x521
+#define OBD_FAIL_PTLRPC_ROUND_XID			0x530
 #define OBD_FAIL_PTLRPC_CONNECT_RACE			0x531
 
 #define OBD_FAIL_OBD_PING_NET				0x600
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 7c243af..ac16878 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -712,6 +712,8 @@ static inline void ptlrpc_assign_next_xid(struct ptlrpc_request *req)
 	spin_unlock(&req->rq_import->imp_lock);
 }
 
+static atomic64_t ptlrpc_last_xid;
+
 int ptlrpc_request_bufs_pack(struct ptlrpc_request *request,
 			     u32 version, int opcode, char **bufs,
 			     struct ptlrpc_cli_ctx *ctx)
@@ -761,7 +763,6 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request,
 	ptlrpc_at_set_req_timeout(request);
 
 	lustre_msg_set_opc(request->rq_reqmsg, opcode);
-	ptlrpc_assign_next_xid(request);
 
 	/* Let's setup deadline for req/reply/bulk unlink for opcode. */
 	if (cfs_fail_val == opcode) {
@@ -776,6 +777,11 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request,
 		} else if (CFS_FAIL_CHECK(OBD_FAIL_PTLRPC_LONG_BOTH_UNLINK)) {
 			fail_t = &request->rq_reply_deadline;
 			fail2_t = &request->rq_bulk_deadline;
+		} else if (CFS_FAIL_CHECK(OBD_FAIL_PTLRPC_ROUND_XID)) {
+			time64_t now = ktime_get_real_seconds();
+
+			atomic64_set(&ptlrpc_last_xid,
+				     ((u64)now >> 4) << 24);
 		}
 
 		if (fail_t) {
@@ -791,6 +797,7 @@ int ptlrpc_request_bufs_pack(struct ptlrpc_request *request,
 			msleep(4 * MSEC_PER_SEC);
 		}
 	}
+	ptlrpc_assign_next_xid(request);
 
 	return 0;
 
@@ -3085,8 +3092,6 @@ void ptlrpc_abort_set(struct ptlrpc_request_set *set)
 	}
 }
 
-static atomic64_t ptlrpc_last_xid;
-
 /**
  * Initialize the XID for the node.  This is common among all requests on
  * this node, and only requires the property that it is monotonically
-- 
1.8.3.1



More information about the lustre-devel mailing list