[Lustre-devel] readdir for striped dir
Tom.Wang at Sun.COM
Tue Mar 23 04:29:16 PDT 2010
Andreas Dilger wrote:
> On 2010-03-22, at 17:09, Tom.Wang wrote:
>> In CMD, one directory can be striped to several MDTs according to the
>> hash value of each entry (calculated by its name). Suppose we have N
>> MDTs and hash range is 0 to MAX_HASH. First server will keep records
>> with hashes [ 0 ... MAX_HASH / N - 1], second one with hashes
>> / N ... 2 * MAX_HASH / N] and so on. Currently, it uses the same hash
>> policy with the one used on disk(ldiskfs/ext3 hash), so when reading
>> striped directory, the entries from different stripe objects can be
>> mapped on client side cache simply, actually this page cache is only
>> maintained in llite layer. But this bonding of CMD split-dir protocal
>> with on-disk hash seems not good, and it even brings more problems
>> porting MDT to kdmu.
>> This dir-entry page cache should be moved to mdc layer, and each
>> object will have its own page cache. It will need 2 lookups for
>> an entry in the page cache, first locating the stripe
>> objects(ll_stripe_offset will be added in ll_file_data to indicate the
>> stripe offset), then got the page by offset(f_pos) inside the
>> stripe_object. The entry page cache can even be organized as the favor
>> of different purposes, for example readdir-plus, dir-extent lock.
>> we can reuse the cl_page on mdc layer, but that might need object
>> layering on metadata stack. In the first step probably register some
>> page callback for mdc to manage the page cache.
> Tom, could you please explain the proposed mechanism for hashing in
> this scheme? Will there be one hash function at the LMV level to
> select the dirstripe, and a second one at the MDC level?Does this
> imply that the client still needs to know the hashing scheme used by
> the backing storage? At least this allows a different hash per
> dirstripe, which is important for DMU because the hash seed is
> different for each directory.
Client does not need to know the hash scheme of the backing storage.
LMV will use new hash function to select stripe object (mdc), which
could be independent with the one
used in the storage. In mdc level, it just need to map the entries of
each dir stripe object in the cache,
we can index the cache in anyway as we want, probably hash order (as the
server storage) is a good choice,
because client can easily find and cancel the pages by the hash in later
dir-extent lock. Note: Even in this case,
client does not need to know server hash scheme at all, since server
will set the hash-offset of these pages, client just
need to put these pages on the cache by hash-offset.
Currently, the cache will only be touched by readdir. If the cache will
be used by readdir-plus later, i.e. we need locate
the entry by name, then client must use the same hash as the server
storage, but server will tell client which hash function
it use. Yes, different hash per dirstripe should not be a problem here.
> Cheers, Andreas
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
More information about the lustre-devel