[lustre-devel] [PATCH 12/28] lustre: ptlrpc: do not wakeup every second

James Simmons jsimmons at infradead.org
Sun Nov 4 12:53:35 PST 2018


> > Neil,
> >
> > Does your statement imply this would spin?  It definitely doesn’t just
> > spin (that behavior in a main “wait for work” spot of a (depending on
> > settings) ~per-CPU daemon would render systems unusable and this patch
> > has been in testing for a while).  So what is the detailed behavior of
> > a “timeout that expires immediately”?
> 
> Hi Patrick,
>  it definitely spins for me.
> 
>  I should have clarified that the SFS patch
> 
>    e81847bd0651 LU-9660 ptlrpc: do not wakeup every second
> 
>  is correct, as __l_wait_event() treats a timeout value of 0 as meaning an
>  indefinite timeout.
>  The error was in the conversion to wait_event_idle_timeout().  The
>  various wait_event*timeout() functions treat 0 as 1 less than 1.
>  If you want to not have a timeout, you need to not use the *_timeout()
>  version.
>  If a timeout is undesirable rather than fatal, then
>  MAX_SCHEDULE_TIMEOUT can be used.  In this case, that seemed best.
> 
> Thanks,
> NeilBrown
> 
> 
> >
> > - Patrick
> >
> >
> > ________________________________
> > From: lustre-devel <lustre-devel-bounces at lists.lustre.org> on behalf of NeilBrown <neilb at suse.com>
> > Sent: Sunday, October 28, 2018 7:03:02 PM
> > To: James Simmons; Andreas Dilger; Oleg Drokin
> > Cc: Lustre Development List
> > Subject: Re: [lustre-devel] [PATCH 12/28] lustre: ptlrpc: do not wakeup every second
> >
> > On Sun, Oct 14 2018, James Simmons wrote:
> >
> >> From: Alex Zhuravlev <bzzz at whamcloud.com>
> >>
> >> Even if there are no RPC requests on the set, there is no need to
> >> wake up every second. The thread is woken up when a request is added
> >> to the set or when the STOP bit is set, so it is sufficient to only
> >> wake up when there are requests on the set to worry about.
> >>
> >> Signed-off-by: Alex Zhuravlev <bzzz at whamcloud.com>
> >> WC-bug-id: https://jira.whamcloud.com/browse/LU-9660
> >> Reviewed-on: https://review.whamcloud.com/28776
> >> Reviewed-by: Andreas Dilger <adilger at whamcloud.com>
> >> Reviewed-by: Patrick Farrell <paf at cray.com>
> >> Reviewed-by: Oleg Drokin <green at whamcloud.com>
> >> Signed-off-by: James Simmons <jsimmons at infradead.org>
> >> ---
> >>  drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> index c201a88..5b4977b 100644
> >> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c
> >> @@ -371,7 +371,7 @@ static int ptlrpcd_check(struct lu_env *env, struct ptlrpcd_ctl *pc)
> >>                }
> >>        }
> >>
> >> -     return rc;
> >> +     return rc || test_bit(LIOD_STOP, &pc->pc_flags);
> >>  }
> >>
> >>  /**
> >> @@ -441,7 +441,7 @@ static int ptlrpcd(void *arg)
> >>                lu_context_enter(env.le_ses);
> >>                if (wait_event_idle_timeout(set->set_waitq,
> >>                                            ptlrpcd_check(&env, pc),
> >> -                                         (timeout ? timeout : 1) * HZ) == 0)
> >> +                                         timeout * HZ) == 0)
> >>                        ptlrpc_expired_set(set);
> >
> > This is incorrect.
> > A timeout of zero means the timeout happens after zero jiffies
> > (immediately), it doesn't mean there is no timeout.
> > If we want a "timeout" of zero to mean "Wait forever", we need something
> > like:
> >
> >   wait_event_idle_timeout(.....,
> >                           timeout ? (timeout * HZ) : MAX_SCHEDULE_TIMEOUT) == 0
> >
> > I've updated the patch accordingly.

I did that change locally as well and my CPU load problem went away. 
Thanks for figuring it out. Will be more careful in the future.


More information about the lustre-devel mailing list