[lustre-devel] [PATCH 05/19] lnet: libcfs: add timeout to cfs_race() to fix race

James Simmons jsimmons at infradead.org
Sun Nov 28 15:27:40 PST 2021

From: Alex Zhuravlev <bzzz at whamcloud.com>

there is no guarantee for the branches in cfs_race() to be executed
in strict order, thus it's possible that the second branch (with
cfs_race_state=1) is executed before the first branch and then another
thread executing the first branch gets stuck.

this construction is used for testing only and as a
workaround it's enough to timeout.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13358
Lustre-commit: 2d2d381f35ee00431 ("LU-13358 libcfs: add timeout to cfs_race() to fix race")
Signed-off-by: Alex Zhuravlev <bzzz at whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43161
Reviewed-by: James Simmons <jsimmons at infradead.org>
Reviewed-by: Neil Brown <neilb at suse.de>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
 include/linux/libcfs/libcfs_fail.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/libcfs/libcfs_fail.h b/include/linux/libcfs/libcfs_fail.h
index 45166c5..731401b 100644
--- a/include/linux/libcfs/libcfs_fail.h
+++ b/include/linux/libcfs/libcfs_fail.h
@@ -213,8 +213,14 @@ static inline void cfs_race_wait(u32 id)
 			cfs_race_state = 0;
 			CERROR("cfs_race id %x sleeping\n", id);
-			rc = wait_event_interruptible(cfs_race_waitq,
-						      cfs_race_state != 0);
+			/*
+			 * XXX: don't wait forever as there is no guarantee
+			 * that this branch is executed first. for testing
+			 * purposes this construction works good enough
+			 */
+			rc = wait_event_interruptible_timeout(cfs_race_waitq,
+							      cfs_race_state != 0,
+							      5 * HZ);
 			CERROR("cfs_fail_race id %x awake: rc=%d\n", id, rc);

More information about the lustre-devel mailing list