[lustre-devel] Apparent bug in LU-8130 ptlrpc: convert conn_hash to rhashtable / Linux-commit: ac2370ac2bc5215daf78546cd8d925510065bb7f
NeilBrown
neilb at suse.com
Tue Nov 6 14:23:32 PST 2018
On Tue, Nov 06 2018, Patrick Farrell wrote:
> Neil, James,
>
> Sent this earlier, but with nasty formatting that got it rejected. Whoops.
>
> It looks like the patch we landed as:
> LU-8130 ptlrpc: convert conn_hash to rhashtable
>
> Linux has a resizeable hashtable implementation in lib,
> so we should use that instead of having one in libcfs.
>
> This patch converts the ptlrpc conn_hash to use rhashtable.
> In the process we gain lockless lookup.
>
> As connections are never deleted until the hash table is destroyed,
> there is no need to count the reference in the hash table. There
> is also no need to enable automatic_shrinking.
>
> Linux-commit: ac2370ac2bc5215daf78546cd8d925510065bb7f
>
> Introduced a bug. Ihara-san opened something to track it here:
> https://jira.whamcloud.com/browse/LU-11624
>
> It’s a null pointer in nid_hash(); there are some more details at that link.
That commit changed ptlrpc_connection_get() so that it can return NULL.
This is actually one of the really annoying things about rhashtable. I
cannot convince the maintainers that sometimes you need guaranteed
success (which is quite possible with a hash table). They come from the
networking community were they drop packets on errors all the time - no
big deal.
Anyway, all users of ptlrpc_connection_get() in the client code (in
drivers/staging) test the return value against NULL, so this was safe.
In SFS lustre, the call in target_handle_connect() assigned the
result to export->exp_connection without checking against NULL.
That leads to the oops.
ptlrpc_connection_get() will return NULL when
rhashtable_lookup_get_insert_fast() returns an error.
The errors are mostly weird internal implementation details leaking out
the API. A retry, possibly after a short delay, will usually succeed.
(see https://lwn.net/Articles/751974/, section "Failure during insertion)
If this happens a lot, it might suggest that the hash function is poor.
Ahhhh... this was built against a 3.10 kernel.
Prior to 4.6, hash_64() was not so much "poor" as "appalling".
See https://lwn.net/Articles/687494/
For SFS we should avoid using hash_64() (at least before 4.6) and
instead use something like
hash_32((foo>>32) ^ hash_32(foo & 0xFFFFFFFF))
NeilBrown
>
> We’re seeing it at Cray as well, when testing the current WhamCloud branch.
>
> Basically, when we fail over an MDS(/MDT) under load (ie with real activity on the file system) we hit this panic about 30-50% of the time right now. I assume it’s possible on OSSes as well but we haven’t seen it there.
>
> I haven’t done any detailed investigation, but I thought I’d bring it to your attention. Per Ihara-san in LU-11624, the crash does not happen without the commit listed above.
>
> • Patrick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181107/9aae45cc/attachment.sig>
More information about the lustre-devel
mailing list