[lustre-devel] Regression with delayed_work in sec_gc

Tue May 22 15:08:17 PDT 2018

On Tue, May 22 2018, Dilger, Andreas wrote:

> On May 20, 2018, at 19:22, NeilBrown <neilb at suse.com> wrote:
>> 
>> On Thu, May 17 2018, Sebastien Buisson wrote:
>> 
>>>> Le 16 mai 2018 à 00:22, NeilBrown <neilb at suse.com> a écrit :
>>>> 
>>>> On Tue, May 15 2018, James Simmons wrote:
>>>> 
>>>>> Dmitry has backported your upstream commit
>>>>> 
>>>>> be3c64da248b1ada2bbcfc3855f07536d8087acb ( ptlrpc: use workqueue for
>>>>> pinger)
>>>>> 
>>>>> and Sebastien tried out the change which can be seen at
>>>>> https://review.whamcloud.com/#/c/31724. Sebastien is having issues with
>>>>> unloading osc, mdc, ost or mdt kernel modules. He writes that he activated
>>>>> Kerberos for my file system, did some IOs, then tried to unmount the
>>>>> client and the  targets. Unfortunately, on some occasions, while
>>>>> unmounting, the mdc module for instance has extra references (as shown by
>>>>> 'lsmod'), and cannot be unloaded. After a while, the extra references are
>>>>> dropped, and the  module can be unloaded. It also happens on occasions on
>>>>> server side, for the mdt or ost modules. I tried destroy_workqueue but
>>>>> that made things worst :-( Do you know what is going on?
>>>> 
>>>> My guess is that ptlrpc_pinger_main() is looping, so
>>>> cancel_delayed_work_sync() is waiting an unusually long time for it.
>>>> This patch might fix the problem - it certainly makes the usage
>>>> of "this_ping" closer to the original code.
>>>> 
>>>> Thanks,
>>>> NeilBrown
>>>> 
>>>> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> index 89eef8ec7df4..131b201541e8 100644
>>>> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
>>>> @@ -222,12 +222,13 @@ static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
>>>> 
>>>> static void ptlrpc_pinger_main(struct work_struct *ws)
>>>> {
>>>> -	unsigned long this_ping = jiffies;
>>>> 	long time_to_next_wake;
>>>> 	struct timeout_item *item;
>>>> 	struct obd_import *imp;
>>>> 
>>>> 	do {
>>>> +		unsigned long this_ping = jiffies;
>>>> +
>>>> 		mutex_lock(&pinger_mutex);
>>>> 		list_for_each_entry(item, &timeout_list, ti_chain) {
>>>> 			item->ti_cb(item, item->ti_cb_data);
>>> 
>>> Still same problem when testing "ptlrpc: use delayed_work in sec_gc" with patch "ptlrpc: use workqueue for pinger" and modification as suggested above. Kernel module ptlrpc_gss holds extra references, and cannot be unloaded.
>>> 
>>> However, patch "ptlrpc: use workqueue for pinger" alone does not have the issue.
>> 
>> Thanks for the clear explanation - helps a lot.  I see the problem - I think.
>> 
>> When (e.g.) ctx_release_kr() is called as part of shutting down a security
>> context, it calls sptlrpc_gc_add_ctx() which signals the gc-thread to
>> re-run.
>> None of this code is in mainline (why isn't the GSS code in
>> mainline????)
>
> We are in the un-enviable position that we can't really land new Lustre
> features into the staging tree, but we can't get out of the staging tree
> without a lot of code changes.

I don't see this as a problem.  "code changes" and "new fetures" are
largely independent.  There certainly are grey areas, but then the rules
aren't inflexible.  If something feature-like is needed to achieve clean
code, I doubt anyone would object.

I also don't see how your observation explains the lack of gss code in
the staging tree.
I've found
Commit: b78560200d41 ("staging: lustre: ptlrpc: gss: delete unused code")
and am wondering why all the sec*.c code didn't disappear at the same
time.
I guess I'll put 'gss' on my list of things to "fix".

>
> It isn't clear to me why Lustre has such a high bar to cross to get out
> of staging, compared to other filesystems that were added in the past.

I don't see the bar as being all that high, but then I haven't seen any
battles that you might have already fought that seem to show a high bar.

In my mind there are two issues: code quality and maintainership
quality.
Lustre was (is) out-of-tree for a long time with little or no apparent
interesting in going mainline, so that results in a fairly low score for
maintainership.  Maintainership means working in and with the community.
While lustre was out-of-tree it developed lot of useful functionality
which was also independently (and differently) developed in mainline
linux.  This makes the code look worse than it is - because it seems to
be unnecessarily different.
Lustre also previously supported multiple kernels, so it had (has)
abstraction layers which are now just technical debt.
So it is coming from a fairly low starting point.  I think it is
reasonable for the Linux community to want to be able to see lustre clearly
rising above all that past, clearing the debt with interest (you might
say).
This is all quite achievable - it just takes time and effort (and skill
- I think we have that among us).
I've been given time by SUSE to work on this and I can clear a lot of
the technical debt.  The more we work together, the easier it will be.
James has been a great help and I think momentum is building.
The community side needs key people to stop thinking of "us vs them" and
start thinking of lustre as part of the Linux community.  Don't talk
about "upstream has accepted this" or "upstream have rejected that", but
get on board with "we've landed this upstream" or "we are working to
find the best solution for that".
It might help to start using the mailing list more, rather than
depending on whamcloud.  The work-flow is different, but is quite
workable, and shows a willingness to be part of the broader community.

> It's not like Lustre is going away any time soon, unlike some no-name
> ethernet card driver, and we could still continue to improve the Lustre
> code if we could move out of staging, but it could be done in a more
> sustainable manner where we also land feature patches and try to bring
> the kernel and master code closer together over time.

I think we should do one thing at a time.
Firstly, focus on getting out of staging.  Freeze the current feature
set and get all existing features into an acceptable form.
Secondly, add all the fixes and features from master-code into Linux.
If you also want to move fixes from Linux into master, that's up to you.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180523/fe024576/attachment-0001.sig>