[Lustre-devel] New Pools DLD

Andreas Dilger adilger at sun.com
Sat Apr 26 22:13:05 PDT 2008


On Apr 21, 2008  15:08 +0400, Nikita Danilov wrote:
> Andreas Dilger writes:
>  > Since we are already need to change the LOV_MAGIC value, we may as well
>  > do as you suggest and have 0x0BD3ssss instead of 0x0BD30BD0, where
>  > ssss = size of EA in bytes.
>  > 
>  > That would limit us to a 65536-byte striping EA, but it still larger than
> 
> If that happens to be a limiting factor, we can interpret ssss as a
> number of __u32's or __u64's in EA.

The EA size being a multiple of at least __u32 is certainly true, and the
current EAs are also a multiple of __u64 so I don't think this is a bad idea
at all.  This gives us an extra 8x larger EAs (512kB).

> I think we need an ability to have fully general files layout, so that,
> for example, a stripe can in turn be a striped file. This can be
> something like
> 
> struct lov_mds_md_v3 {
>         __u32 lmm_magic;          /* 0x0BD3ssss */
>         __u32 lmm_pattern;        /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
>         __u64 lmm_object_id;      /* LOV object ID */
>         __u64 lmm_object_gr;      /* LOV object group */
>         __u32 lmm_stripe_size;    /* size of stripe in bytes */
>         __u32 lmm_stripe_count;   /* num stripes in use for this object */
> };
> 
> followed by a sequence of stripe layout descriptors each starting with
> 
>         __u32 magic; /* 0xLLLLSSSS. where LLLL is an identifier of a
>                       * layout type (e.g., 0bd3 is raid0 or raid1), and
>                       * SSSS is a size. */
> 
> But for a common case of a striped file where all stripes have the same
> layout, we implement a short-cut:
> 
> struct lov_mds_md_v4 {
>         __u32 lmm_magic;          /* 0x0BD4ssss */
>         __u32 lmm_pattern;        /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
>         __u64 lmm_object_id;      /* LOV object ID */
>         __u64 lmm_object_gr;      /* LOV object group */
>         __u32 lmm_stripe_size;    /* size of stripe in bytes */
>         __u32 lmm_stripe_count;   /* num stripes in use for this object */
>         __u32 lmm_stripe_magic;   /* 0xLLLLSSSS for all stripes */
> };
> 
> followed by an array of stripe layout descriptor, stripped of their
> magics.
> 
> Or we can go one step further and assume particular value of
> lmm_stripe_magic for a particular lmm_magic. In this case,
> ->lmm_stripe_magic field can be removed.

This is essentially the mechanism we use today, which is OK for
the very common case of a single record format.  I do like the
idea of being able to have a heirarchical LOV EA format, and
have been thinking about that for a long time.

The other thing that I noticed in the pNFS layout is the ability
to have "joined" files in a very simple manner.  The "header" part
of the layout (equivalent to our lov_mds_md) also contains the
length in bytes of that part of the file, and then allows mutliple
EAs to be concatenated together.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-devel mailing list