[Lustre-devel] New Pools DLD
Andreas Dilger
adilger at sun.com
Sat Apr 26 22:13:05 PDT 2008
On Apr 21, 2008 15:08 +0400, Nikita Danilov wrote:
> Andreas Dilger writes:
> > Since we are already need to change the LOV_MAGIC value, we may as well
> > do as you suggest and have 0x0BD3ssss instead of 0x0BD30BD0, where
> > ssss = size of EA in bytes.
> >
> > That would limit us to a 65536-byte striping EA, but it still larger than
>
> If that happens to be a limiting factor, we can interpret ssss as a
> number of __u32's or __u64's in EA.
The EA size being a multiple of at least __u32 is certainly true, and the
current EAs are also a multiple of __u64 so I don't think this is a bad idea
at all. This gives us an extra 8x larger EAs (512kB).
> I think we need an ability to have fully general files layout, so that,
> for example, a stripe can in turn be a striped file. This can be
> something like
>
> struct lov_mds_md_v3 {
> __u32 lmm_magic; /* 0x0BD3ssss */
> __u32 lmm_pattern; /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
> __u64 lmm_object_id; /* LOV object ID */
> __u64 lmm_object_gr; /* LOV object group */
> __u32 lmm_stripe_size; /* size of stripe in bytes */
> __u32 lmm_stripe_count; /* num stripes in use for this object */
> };
>
> followed by a sequence of stripe layout descriptors each starting with
>
> __u32 magic; /* 0xLLLLSSSS. where LLLL is an identifier of a
> * layout type (e.g., 0bd3 is raid0 or raid1), and
> * SSSS is a size. */
>
> But for a common case of a striped file where all stripes have the same
> layout, we implement a short-cut:
>
> struct lov_mds_md_v4 {
> __u32 lmm_magic; /* 0x0BD4ssss */
> __u32 lmm_pattern; /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
> __u64 lmm_object_id; /* LOV object ID */
> __u64 lmm_object_gr; /* LOV object group */
> __u32 lmm_stripe_size; /* size of stripe in bytes */
> __u32 lmm_stripe_count; /* num stripes in use for this object */
> __u32 lmm_stripe_magic; /* 0xLLLLSSSS for all stripes */
> };
>
> followed by an array of stripe layout descriptor, stripped of their
> magics.
>
> Or we can go one step further and assume particular value of
> lmm_stripe_magic for a particular lmm_magic. In this case,
> ->lmm_stripe_magic field can be removed.
This is essentially the mechanism we use today, which is OK for
the very common case of a single record format. I do like the
idea of being able to have a heirarchical LOV EA format, and
have been thinking about that for a long time.
The other thing that I noticed in the pNFS layout is the ability
to have "joined" files in a very simple manner. The "header" part
of the layout (equivalent to our lov_mds_md) also contains the
length in bytes of that part of the file, and then allows mutliple
EAs to be concatenated together.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-devel
mailing list