[lustre-devel] [PATCH 01/12] lustre: llite: do not take mod rpc slot for getxattr

James Simmons jsimmons at infradead.org
Sun Dec 12 07:07:52 PST 2021

From: Vladimir Saveliev <vlaidimir.saveliev at hpe.com>

The following scenario may lead to client eviction:
clientA                clientB                  MDS
threadA1: write to file F1, get
and hold DoM MDC LDLM lock L1:
     [hold ref until write done]

threadA2-A8: create files F2-F8:
      [hold RPC slot until create done]

                                                OST(s) in recovery.
                                                MDS waiting on OST(s) to
                                                precreate new objects.

    -> cl_io_start()
     -> __generic_file_aio_write()
      -> file_remove_suid()
       -> ll_xattr_cache_refill()
        -> mdc_xattr_common()
         -> ptlrpc_get_mod_rpc_slot()
         [blocked waiting for RPC slot]

                        threadB1: write file F1,
                    enqueue DoM MDC lock L1

                                                MDS sends blocking AST
                                                to clientA for lock L1

ldlm_threadA3: cannot cancel busy lock L1:
   -> ldlm_handle_bl_callback()
   ["Lock L1 referenced, will be cancelled later"]

                                                MDS evicts clientA for
                                                not cancelling lock L1

threadA1: never completes write:

The fix is to add IT_GETXATTR to list of operations which do not
need mod rpc slot.

Tests to illustrate the issue is added.

wait_for_function(): total sleep time (wait) is to be equal to max
when 1 is returned.

HPE-bug-id: LUS-7271
WC-bug-id: https://jira.whamcloud.com/browse/LU-12347
Lustre-commit: eb64594e4473af85 ("LU-12347 llite: do not take mod rpc slot for getxattr")
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev at hpe.com>
Reviewed-on: https://review.whamcloud.com/44151
Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko at hpe.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
 fs/lustre/include/obd_support.h | 1 +
 fs/lustre/llite/xattr_cache.c   | 2 ++
 fs/lustre/mdc/mdc_locks.c       | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 540e1e0..d57c25c 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -484,6 +484,7 @@
 #define OBD_FAIL_LLITE_RACE_MOUNT			0x1417
 #define OBD_FAIL_LLITE_PAGE_ALLOC			0x1418
 #define OBD_FAIL_LLITE_OPEN_DELAY			0x1419
+#define OBD_FAIL_LLITE_XATTR_PAUSE			0x1420
 #define OBD_FAIL_FID_INDIR				0x1501
 #define OBD_FAIL_FID_INLMA				0x1502
diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c
index b044c89..7c1f5b7 100644
--- a/fs/lustre/llite/xattr_cache.c
+++ b/fs/lustre/llite/xattr_cache.c
@@ -396,6 +396,8 @@ static int ll_xattr_cache_refill(struct inode *inode)
 	u32 *xsizes;
 	int rc, i;
 	rc = ll_xattr_find_get_lock(inode, &oit, &req);
 	if (rc)
 		goto err_req;
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 66f0039..2c344d7 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -886,7 +886,7 @@ static inline bool mdc_skip_mod_rpc_slot(const struct lookup_intent *it)
 	if (it &&
 	    (it->it_op == IT_GETATTR || it->it_op == IT_LOOKUP ||
-	     it->it_op == IT_READDIR ||
+	     it->it_op == IT_READDIR || it->it_op == IT_GETXATTR ||
 	     (it->it_op == IT_LAYOUT && !(it->it_flags & MDS_FMODE_WRITE))))
 		return true;
 	return false;

More information about the lustre-devel mailing list