[lustre-devel] [PATCH 284/622] lnet: libcfs: poll fail_loc in cfs_fail_timeout_set()

James Simmons jsimmons at infradead.org
Thu Feb 27 13:12:32 PST 2020


From: Alex Zhuravlev <bzzz at whamcloud.com>

Some internal test usually take 800-900s which is almost
half of the whole sanityn test suite run time. 99.(9)% of
the time the tests just wait to ensure specific order the
operations execute in.

the patch changes cfs_fail_timeout_set() so that it can
interrupt waiting if fail_loc is set to 0 - polling with
1/10s frequency is used.

the tests itself are modified to reset fail_loc. to be
able to do so both operations (referenced as OP1 and OP2
in the tests) are run in background. once started and then
ensured with pdo_sched() helper that MDS threads got to the
blocking points, we can interrupt OP1 and do usual checks.

ONLY=40-47 sh sanityn.sh take: 1017s before and 78s after.

WC-bug-id: https://jira.whamcloud.com/browse/LU-2233
Lustre-commit: 743b85a32e24 ("LU-2233 tests: improve tests sanityn/40-47")
Signed-off-by: Alex Zhuravlev <bzzz at whamcloud.com>
Reviewed-on: https://review.whamcloud.com/4392
Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
Reviewed-by: Mike Pershin <mpershin at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 net/lnet/libcfs/fail.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/lnet/libcfs/fail.c b/net/lnet/libcfs/fail.c
index 6ee4de2..40e93b00 100644
--- a/net/lnet/libcfs/fail.c
+++ b/net/lnet/libcfs/fail.c
@@ -131,14 +131,21 @@ int __cfs_fail_check_set(u32 id, u32 value, int set)
 
 int __cfs_fail_timeout_set(u32 id, u32 value, int ms, int set)
 {
+	ktime_t till = ktime_add_ms(ktime_get(), ms);
 	int ret;
 
 	ret = __cfs_fail_check_set(id, value, set);
 	if (ret && likely(ms > 0)) {
-		CERROR("cfs_fail_timeout id %x sleeping for %dms\n",
-		       id, ms);
-		schedule_timeout_uninterruptible(ms * HZ / 1000);
-		CERROR("cfs_fail_timeout id %x awake\n", id);
+		CERROR("cfs_fail_timeout id %x sleeping for %dms\n", id, ms);
+		while (ktime_before(ktime_get(), till)) {
+			schedule_timeout_uninterruptible(HZ / 10);
+			if (!cfs_fail_loc) {
+				CERROR("cfs_fail_timeout interrupted\n");
+				break;
+			}
+		}
+		if (cfs_fail_loc)
+			CERROR("cfs_fail_timeout id %x awake\n", id);
 	}
 	return ret;
 }
-- 
1.8.3.1



More information about the lustre-devel mailing list