[lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel

James Simmons jsimmons at infradead.org
Mon Aug 2 12:50:40 PDT 2021


From: Patrick Farrell <pfarrell at whamcloud.com>

The cancellation of a an OSC lock without an LDLM lock
(a 'lockless' OSC lock) should not flush pages.  Only
direct i/o is allowed to use a lockless OSC lock, and
direct i/o does not create flushable pages.

DIO pages are not flushable because:
A) all synced ASAP, and
B) the OSC extents created for them are not added to the
extent tree which is used to track these pages.

Instead, this has the effect of trying to flush pages from
ongoing buffered i/o.  This can lead to crashes like the
following:

osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed

This assert essentially says the lock cancellation
(hp == 1) found an active i/o (an extent in the OES_ACTIVE
state).

This is not allowed because the flushing code assumes an
LDLM lock is being cancelled, which will only start once
there is no active i/o.  Because the OSC lock being
cancelled is not associated with an LDLM lock, this is not
true, and nothing prevents active i/o under a different
lock, leading to this assert.

The solution is simply to not flush pages when cancelling a
no-LDLM-lock OSC lock.

Additional note:
New lockless OSC locks cannot be created if they are
blocked by a regular OSC lock, but a new regular lock can
be created if there is a lockless lock present.

Thus, the sequence is something like this:
Direct i/o creates lockless OSC lock
Buffered i/o creates OSC and LDLM lock on the same range
Direct i/o finishes, starts cancelling its OSC lock
Buffered i/o is still ongoing, with extents in OES_ACTIVE

This results in the above crash during the OSC lock
cancellation.

Note it would be possible to resolve this issue by not
allowing lockless OSC locks to match regular OSC locks, but
this is not necessary, since there's no reason for lockless
locks to flush pages on cancellation.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14814
Lustre-commit: 6717c573ed90da91 ("LU-14814 osc: osc: Do not flush on lockless cancel")
Signed-off-by: Patrick Farrell <pfarrell at whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44152
Reviewed-by: Li Dongyang <dongyangli at ddn.com>
Reviewed-by: Wang Shilong <wshilong at whamcloud.com>
Reviewed-by: Oleg Drokin <green at whamcloud.com>
Signed-off-by: James Simmons <jsimmons at infradead.org>
---
 fs/lustre/osc/osc_lock.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index f6faed7..eb3cb58 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -1134,16 +1134,8 @@ static void osc_lock_lockless_cancel(const struct lu_env *env,
 {
 	struct osc_lock *ols = cl2osc_lock(slice);
 	struct osc_object *osc = cl2osc(slice->cls_obj);
-	struct cl_lock_descr *descr = &slice->cls_lock->cll_descr;
-	int result;
 
 	LASSERT(!ols->ols_dlmlock);
-	result = osc_lock_flush(osc, descr->cld_start, descr->cld_end,
-				descr->cld_mode, false);
-	if (result)
-		CERROR("Pages for lockless lock %p were not purged(%d)\n",
-		       ols, result);
-
 	osc_lock_wake_waiters(env, osc, ols);
 }
 
-- 
1.8.3.1



More information about the lustre-devel mailing list