[lustre-devel] [PATCH] staging: lustre: add error handling for try_module_get
NeilBrown
neilb at suse.com
Wed Jun 13 05:02:55 PDT 2018
On Wed, Jun 13 2018, David Laight wrote:
> From: Zhouyang Jia
>> Sent: 12 June 2018 05:49
>>
>> When try_module_get fails, the lack of error-handling code may
>> cause unexpected results.
>>
>> This patch adds error-handling code after calling try_module_get.
> ...
>> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c
>> @@ -2422,7 +2422,10 @@ ksocknal_base_startup(void)
>>
>> /* flag lists/ptrs/locks initialised */
>> ksocknal_data.ksnd_init = SOCKNAL_INIT_DATA;
>> - try_module_get(THIS_MODULE);
>> + if (!try_module_get(THIS_MODULE)) {
>> + CERROR("%s: cannot get module\n", __func__);
>> + goto failed;
>> + }
>
>
> Can try_module_get(THIS_MODULE) ever fail?
Yes.
> Since you are running code in 'THIS_MODULE' the caller must have a
> reference that can't go away.
Not necessarily, though it does usually work that way.
try_module_get() can fail while the exit function is running, but it is
safe to run code in the module until the exit function completes.
So if the exit function takes a lock, then other code can safely run
code in the module while holding the lock, but not holding a reference
to the module. If this code calls try_module_get(), it could fail.
That is exactly what is happening here.
ksoclnd_exit() calls lnet_unregister_lnd() which takes
the_lnet.ln_lnd_mutex.
ksocknal_base_startup() is called from ksocknal_startup()
which is the_ksocklnd.lnd_startup and is called, from
lnet_startup_lndni(), with that lock held.
> So try_module_get() just increments the count that is already greater
> than zero.
>
> Similarly module_put(THIS_MODULE) must never be able to release the
> last reference.
It can if a suitable lock is held.
> Any such calls that aren't in error paths after try_module_get() are
> probably buggy.
Being in an error path doesn't make it safe.
module_put(THIS_MODULE) can only be safe if a lock is held which
prevents the exit function from completing. Some code outside the
module must release the lock.
Having said that, I don't really like this approach. I much prefer for
the module reference to be taken and put outside of the module - it
seems less error-prone.
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180613/d4ecae9b/attachment.sig>
More information about the lustre-devel
mailing list