[lustre-discuss] default directory striping with Lustre 2.8

Dilger, Andreas andreas.dilger at intel.com
Thu May 5 03:42:40 PDT 2016


On 2016/05/04, 17:04, "Faaland, Olaf P." <faaland1 at llnl.gov> wrote:
>Hi,
>
>Suppose you have m MDTs in your filesystem, and create a new directory
>and set default directory striping using
>
>
>lfs mkdir --count=c --index=k <path> && lfs setdirstripe --default
>--count=c <path>
>
>Suppose that c < m and m > 2.
>
>Then you make subdirectories, like
>
>mkdir <path>/child.{1,2,3,...}
>
>a) By design, do the child directories have the same starting index as
><path>?
>b) By design, are the child directories all striped across the same set
>of MDTs as <path>?
>

Neither is exactly correct...

If <path> is a non-striped directory, then the "child.*" inode will always
reside on the same MDT as <path> and (a) is currently true - all the
directory shards will start on the same MDT that <path> is on.

If <path> is a striped directory, then the MDT on which the "child.*"
inode resides depends on the hash of the filename and the striping
parameters, if not otherwise specified.  The directory shards of each
"child.*" directory will start on the same MDT that "child.*" inode is on.
 This will distribute the directory shards across all the MDTs in a
uniform manner.

There is currently no mechanism to dynamically balance the MDT selection
based on space usage or similar, so it is round-robin based on the
starting MDT index.  It would also be beneficial to allow specifying a
default starting index of "-1" to distribute the directory shards across
all MDTs if the parent directory is not striped, and if no explicit
starting MDT index is given.  We are hoping to implement this in the next
few months, depending on availability.


Cheers, Andreas


>I didn't see that specified one way or the other in the DNE phase 2 high
>level design document at
>http://wiki.opensfs.org/DNE_StripedDirectories_HighLevelDesign_wiki_versio
>n.    If I should look elsewhere, let me know.
>
>In a test I was doing today, I noticed that neither (a) nor (b) were true
>in practice.  I'm wondering whether that's a bug or a feature.  Here's
>partial output from my test.
>
>$ lfs mkdir --count=6 --index=2 /p/lustre/faaland1/count6_index2
>$ lfs setdirstripe -D --count=6 /p/lustre/faaland1/count6_index2
>$ mkdir 
>/p/lustre/faaland1/count6_index2/subdir.{1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>$ lfs getdirstripe /p/lustre/faaland1/count6_index2
>/p/lustre/faaland1/count6_index2
>lmv_stripe_count: 6 lmv_stripe_offset: 2
>mdtidx           FID[seq:oid:ver]
>     2           [0x280000400:0x33f3:0x0]
>     3           [0x2c0000404:0x33f3:0x0]
>     4           [0x300000402:0x33f2:0x0]
>     5           [0x340000407:0x33f1:0x0]
>     6           [0x380000406:0x33f0:0x0]
>     7           [0x3c0000404:0x33ef:0x0]
>/p/lustre/faaland1/count6_index2/subdir.4
>lmv_stripe_count: 6 lmv_stripe_offset: 2
>mdtidx           FID[seq:oid:ver]
>     2           [0x280000400:0x33f5:0x0]
>     3           [0x2c0000404:0x33f5:0x0]
>     4           [0x300000402:0x33f4:0x0]
>     5           [0x340000407:0x33f3:0x0]
>     6           [0x380000406:0x33f2:0x0]
>     7           [0x3c0000404:0x33f1:0x0]
>/p/lustre/faaland1/count6_index2/subdir.9
>lmv_stripe_count: 6 lmv_stripe_offset: 5
>mdtidx           FID[seq:oid:ver]
>     5           [0x340000400:0x37a1:0x0]
>     6           [0x380000405:0x37a1:0x0]
>     7           [0x3c0000402:0x37a0:0x0]
>     8           [0x40000040e:0x379f:0x0]
>     9           [0x440000403:0x379e:0x0]
>     0           [0x200000405:0x379d:0x0]
>/p/lustre/faaland1/count6_index2/subdir.3
>lmv_stripe_count: 6 lmv_stripe_offset: 5
>mdtidx           FID[seq:oid:ver]
>     5           [0x340000400:0x37a0:0x0]
>     6           [0x380000405:0x37a0:0x0]
>     7           [0x3c0000402:0x379f:0x0]
>     8           [0x40000040e:0x379e:0x0]
>     9           [0x440000403:0x379d:0x0]
>     0           [0x200000405:0x379c:0x0]
>/p/lustre/faaland1/count6_index2/subdir.14
>lmv_stripe_count: 6 lmv_stripe_offset: 7
>mdtidx           FID[seq:oid:ver]
>     7           [0x3c0000400:0x30d4:0x0]
>     8           [0x400000403:0x30d4:0x0]
>     9           [0x440000405:0x30d3:0x0]
>     0           [0x200000407:0x30d2:0x0]
>     1           [0x240000407:0x30d1:0x0]
>     2           [0x280000407:0x30d0:0x0]
>...
>
>
>Olaf P. Faaland
>Livermore Computing
>phone : 925-422-2263 



More information about the lustre-discuss mailing list