[lustre-devel] [PATCH 16/28] lustre: statahead: missing barrier before wake_up
James Simmons
jsimmons at infradead.org
Sun Oct 21 15:52:37 PDT 2018
> On Sun, Oct 14 2018, James Simmons wrote:
>
> > From: Lai Siyao <lai.siyao at whamcloud.com>
> >
> > A barrier is missing before wake_up() in ll_statahead_interpret(),
> > which may cause 'ls' hang. Under the right conditions a basic 'ls'
> > can fail. The debug logs show:
> >
> > statahead.c:683:ll_statahead_interpret()) sa_entry software rc -13
> > statahead.c:1666:ll_statahead()) revalidate statahead software: -11.
> >
> > Obviously statahead failure didn't notify 'ls' process in time.
> > The mi_cbdata can be stale so add a barrier before calling
> > wake_up().
> >
> > Signed-off-by: Lai Siyao <lai.siyao at whamcloud.com>
> > Signed-off-by: Bob Glossman <bob.glossman at intel.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-9210
> > Reviewed-on: https://review.whamcloud.com/27330
> > Reviewed-by: Nathaniel Clark <nclark at whamcloud.com>
> > Reviewed-by: Oleg Drokin <green at whamcloud.com>
> > Signed-off-by: James Simmons <jsimmons at infradead.org>
> > ---
> > drivers/staging/lustre/lustre/llite/statahead.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/staging/lustre/lustre/llite/statahead.c b/drivers/staging/lustre/lustre/llite/statahead.c
> > index 1ad308c..0174a4c 100644
> > --- a/drivers/staging/lustre/lustre/llite/statahead.c
> > +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> > @@ -680,8 +680,14 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
> >
> > spin_lock(&lli->lli_sa_lock);
> > if (rc) {
> > - if (__sa_make_ready(sai, entry, rc))
> > + if (__sa_make_ready(sai, entry, rc)) {
> > + /* LU-9210 : Under the right conditions even 'ls'
> > + * can cause the statahead to fail. Using a memory
> > + * barrier resolves this issue.
> > + */
> > + smp_mb();
> > wake_up(&sai->sai_waitq);
> > + }
> > } else {
> > int first = 0;
> > entry->se_minfo = minfo;
> > --
> > 1.8.3.1
>
> Again, this is a fairly lame comment to justify the smp_mb().
> It appears to me that the issue is most likely the value of
> entry->se_state.
> __sa_make_ready() sets this and revalidate_statahead_dentry tests it
> after waiting on sai_waitq.
> So I think it would be best if we changed __sa_make_ready() to
>
> smp_store_release(&entry->se_state, ret < 0 ? SA_ENTRY_INVA : SA_ENTRY_SUCC)
>
> and in ll_statahead_interpret() have
>
> if (smp_load_acquire(&entry->se_state) == SA_ENTRY_SUCC &&
> entry->se_inode) {
>
> This would make it obvious which variable was important, and would show
> the paired synchronization points.
If you think this is lame you should be the JIRA ticket and the original
patch. It had zero info so I attempted to extract what I could out of the
ticket. Hopefully Lai can fill in the details. I have no problems fixing
this another way. I don't see a way in the ticket to easily reproduce this
problem to see the new approach would fix it :-(
More information about the lustre-devel
mailing list