[lustre-devel] [PATCH 03/28] lustre: ptlrpc: missing barrier before wake_up

James Simmons jsimmons at infradead.org
Sun Oct 21 15:48:16 PDT 2018


> On Sun, Oct 14 2018, James Simmons wrote:
> 
> > From: Lai Siyao <lai.siyao at whamcloud.com>
> >
> > ptlrpc_client_wake_req() misses a memory barrier, which may cause
> > strange errors.
> >
> > Signed-off-by: Lai Siyao <lai.siyao at whamcloud.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-8935
> > Reviewed-on: https://review.whamcloud.com/26583
> > Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
> > Reviewed-by: Wang Shilong <wshilong at ddn.com>
> > Reviewed-by: Oleg Drokin <green at whamcloud.com>
> > Signed-off-by: James Simmons <jsimmons at infradead.org>
> > ---
> >  drivers/staging/lustre/lustre/include/lustre_net.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
> > index ce7e98c..468a03e 100644
> > --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> > +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> > @@ -2211,6 +2211,8 @@ static inline int ptlrpc_status_ntoh(int n)
> >  static inline void
> >  ptlrpc_client_wake_req(struct ptlrpc_request *req)
> >  {
> > +	/* ensure ptlrpc_register_bulk see rq_resend as set. */
> > +	smp_mb();
> >  	if (!req->rq_set)
> >  		wake_up(&req->rq_reply_waitq);
> >  	else
> 
> It is good that this memory barrier has a comment, but the comment isn't
> very helpful.
> There is no matching memory barrier in ptlrpc_register_bulk(), so it
> isn't clear what sequencing is important.
> 
> And ptl_send_rpc() tests ->rq_resend *before* ptlrpc_register_bulk() is
> called (which also tests it).  Presumably these should see that same
> value?  So why does the comment refer to ptlrpc_register_bulk() instead
> of ptl_send_rpc() ??
> 
> It all seems rather confusing, so it is very hard to be sure that the
> code is now correct.
> Is someone able to explain?

I wasn't going on much here. While the linux kernel request comments to
place with memory barriers lustre developers tend to never leave an
explaination on why a memory barrier was needed. In this case I examined
the original JIRA ticket to find in one of the comments:

"It looks like ptlrpc_client_wake_req() misses a memory barrier, which may 
cause ptlrpc_resend_req() wake up ptlrpc_send_rpc -> ptlrpc_register_bulk, 
while the latter doesn't see rq_resend set."

I attempted to add that as a comment. This is all I had to go one. Now
Lai is CC to this email so maybe he remembers what it was all about.
 
> Thanks,
> NeilBrown
> 
> 
> 
> > -- 
> > 1.8.3.1
> 


More information about the lustre-devel mailing list