[lustre-devel] Lustre use of hash_long() & cfs_hash_u32_hash()

Mon May 16 10:28:20 PDT 2016

> > Looking at our code I see our duplication is cfs_hash_u32_hash and
> > cfs_hash_u64_hash which could be replaced by the standard linux functions.
> > Am I missing anything else?
> 
> The question is, why were they copied in the first place?
...
> If there's no reason for the duplication, then yes, merge them.

As Oleg pointed out their is no need for the duplication.

> >> Since Lustre was the single biggest culprit (about 25% of that patch),
> >> I was planning on sending a broken-out patch.
> 
> > I expect this is not all the changes needed. Do you have a newer patch or 
> > should I run with this patch?  Also I will look into replace the 
> > cfs_hash_u[32|64]_* code with standard linux hash code.
> 
> I don't have anything newer ATM.  I agree there's probably more; if
> nothing else I didn't check the copied hash functions at all.  What I
> posted was just what I noticed during a search through the kernel for
> all users of the functions I was planning on changing.

I started a patch with your cleanups. So I started to look at replacing
cfs_hash_u[64|32]_hash() with the standard hash_[64|32]() function. 
First I need to explain the code path to make it clear what is going on.
>From the top level the code path in which the cfs_hash_uXX_hash might
be called is:

cfs_hash_bd_get() -> cfs_hash_bd_from_key() -> cfs_hash_id()

cfs_hash_id() is defined as:

static inline unsigned
cfs_hash_id(struct cfs_hash *hs, const void *key, unsigned mask)
{
        return hs->hs_ops->hs_hash(hs, key, mask);
}

and hs_hash is 

struct cfs_hash_ops {
        /** return hashed value from @key */
        unsigned (*hs_hash)(struct cfs_hash *hs, const void *key,
                            unsigned mask);
        ...
}

So ops->hs_hash() is filled in by various lustre layers to generate
some hash value. It is these *hs_hash() that some times uses
cfs_hash_u64_hash() and cfs_hash_u32_hash(). An example of this is
in cl_object.c:

struct cfs_hash_ops pool_hash_operations = {
        .hs_hash        = cl_env_hops_hash,
	...
};

which is defined as:

static unsigned cl_env_hops_hash(struct cfs_hash *lh,
                                 const void *key, unsigned mask)
{
#if BITS_PER_LONG == 64
        return cfs_hash_u64_hash((__u64)key, mask);
#else
        return cfs_hash_u32_hash((__u32)key, mask);
#endif
}

If we change to using hash_64() or hash_32() instead this
will change the behavior we currently have.

Now cfs_hash_u64_hash() is defined as

static inline unsigned
cfs_hash_u64_hash(const __u64 key, unsigned mask)
{
        return ((unsigned)(key * CFS_GOLDEN_RATIO_PRIME_64) & mask);
}

which uses a mask directly. That mask is generated in cfs_hash_id()
with :

unsigned int index = cfs_hash_id(hs, key, (1U << bits) - 1);

So the mask uses the lower bits. We can't change this behavior
since other hp_ops->hs_hash functions that don't use 
cfs_hash_u[64|32]_hash() at all treat this as a mask.

Now for using the standand functions we have for example:

static __always_inline u64 hash_64(u64 val, unsigned int bits)
{
        u64 hash = val;

#if BITS_PER_LONG == 64
        hash = hash * GOLDEN_RATIO_64;
#else
        ....
#endif

        /* High bits are more random, so use them. */
        return hash >> (64 - bits);
}

which generates the mask on the fly using the higher bits.
I can work the code so the mask can be turned into a bit offset
but the mask will still be different. Will this difference
break things?