[Lustre-devel] New Pools DLD
Nikita Danilov
Nikita.Danilov at Sun.COM
Mon Apr 21 04:08:18 PDT 2008
Andreas Dilger writes:
> On Feb 27, 2008 14:50 +0300, Nikita Danilov wrote:
> > Andreas Dilger writes:
> > > are you aware of any desirable LOV EA changes that would be good to
> > > include with the changes for the v3 "pool" EA (attached)? Are there
> > > any changes that are desirable for e.g. FIDs or similar?
> >
> > I think that if we are introducing incompatible LOV EA format, we can as
> > well go forward with changes hinted to at
> >
> > http://arch.lustre.org/index.php?title=MDS_striping_format#Future_developments
>
> Nikita, sorry to take a long time to get back to this issue, but I think
> it is quite valuable to pursue if we are already going to change the
> on-disk format.
>
> Since we are already need to change the LOV_MAGIC value, we may as well
> do as you suggest and have 0x0BD3ssss instead of 0x0BD30BD0, where
> ssss = size of EA in bytes.
>
> That would limit us to a 65536-byte striping EA, but it still larger than
If that happens to be a limiting factor, we can interpret ssss as a
number of __u32's or __u64's in EA.
> what is supported today, and the plans for wide striping also do not call
> for larger EAs. Even supporting 64kB EAs would be an issue with the
> current nifrastructure because the client always has to preallocate a
> receive buffer large enough for the largest EA because it does not know
> the EA size in advance.
>
> The question is whether you think we should also add a magic + size to the
> lov_ost_data_v1 structure, which is currently the same for all EA types.
> Adding a per-stripe magic + size would reduce the number of stripes we
> can allocate per file, and the 160-stripe limit is already a problem for
> some systems with more than 160 OSTs.
I think we need an ability to have fully general files layout, so that,
for example, a stripe can in turn be a striped file. This can be
something like
struct lov_mds_md_v3 {
__u32 lmm_magic; /* 0x0BD3ssss */
__u32 lmm_pattern; /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
__u64 lmm_object_id; /* LOV object ID */
__u64 lmm_object_gr; /* LOV object group */
__u32 lmm_stripe_size; /* size of stripe in bytes */
__u32 lmm_stripe_count; /* num stripes in use for this object */
};
followed by a sequence of stripe layout descriptors each starting with
__u32 magic; /* 0xLLLLSSSS. where LLLL is an identifier of a
* layout type (e.g., 0bd3 is raid0 or raid1), and
* SSSS is a size. */
But for a common case of a striped file where all stripes have the same
layout, we implement a short-cut:
struct lov_mds_md_v4 {
__u32 lmm_magic; /* 0x0BD4ssss */
__u32 lmm_pattern; /* LOV_PATTERN_RAID0, LOV_PATTERN_RAID1 */
__u64 lmm_object_id; /* LOV object ID */
__u64 lmm_object_gr; /* LOV object group */
__u32 lmm_stripe_size; /* size of stripe in bytes */
__u32 lmm_stripe_count; /* num stripes in use for this object */
__u32 lmm_stripe_magic; /* 0xLLLLSSSS for all stripes */
};
followed by an array of stripe layout descriptor, stripped of their
magics.
Or we can go one step further and assume particular value of
lmm_stripe_magic for a particular lmm_magic. In this case,
->lmm_stripe_magic field can be removed.
>
> Cheers, Andreas
> --
Nikita.
More information about the lustre-devel
mailing list