[Lustre-devel] Summary of our HSM discussion

Thu Aug 14 09:38:52 PDT 2008

There are rather a lot of questions here - let's give this a go.

On 8/12/08 12:40 PM, "Kevan" <kfr at sgi.com> wrote:

> Peter,
> 
> Apologies, but I am having some difficulty determining where the
> boundary
> between Lustre and the HSM lies within the first release.  Plus I have
> a few newbie Lustre questions.
> 
> I think your #4 is saying that Lustre will still provide a Space
> Manager in
> the first release, responsible for monitoring filesystem fullness,
> using
> e2scan to pick archive/purge candidates and issuing archive/purge
> requests
> to the HSM, that these tasks are not performed by the HSM itself.

The list will be generated by e2scan initially, in due course by a more
efficient and scalable LRU log (see HLD).

The list will be digested and acted upon by the HSM policy manager.

> True?
> Is the Space Manager logic part of the Coordinator, or is it a
> separate entity?

separate

> Is there one Coordinator/Space Manager pair per filesystem or one
> total
> per site?

List generation will probably be per server target (per MDT or OST tbd).
Rick Matthews can tell us if the policy manager manages sites or file
systems.

> 
> Users will need commands that allow them to archive and recall their
> files
> and/or directory subtrees, and they will want commands like ls and
> find
> that show them the current HSM archive/purge state of their files so
> that
> they can pre-stage purged files before they are needed, and so that
> they
> can purge unneeded files to effectively manage their own quotas.  Will
> these
> commands be provided by Lustre, or by the HSM?

These will be commands issued to lustre, extensions of the "lfs" commands.

> 
> Given that files are only recalled on open, this implies that a file
> which
> is open for either read or write by any client can never be purged,
> correct?

yes

> And a file open for write by any client should never be archived since
> it
> could be silently changing while the archive is in progress.

If the HSM is used for backup one probably wants to back-up the file anyway,
and this is a decision of the policy manager.

>And if
> a file is opened for write after an archive has begun, the HSM will
> be sent a cancel request?

The file system will generate events, the policy manager can decide how it
acts on it.

  Is the necessary information available to
> the
> Space Manager and/or Coordinator so that these rules can be enforced?
> 
> The HSM data mover needs to be able to open a file by FID without
> encountering the adaptive timeout that other users are seeing.  The
> data
> mover's I/Os must not change the file's read and write timestamps.
> The
> data mover needs a get_extents(int fd) function to read the file's
> extent
> map so that it can find the location of holes in sparse files and
> preserve
> those holes within its HSM copies.  Is there an interface available
> that
> provides this functionality?

Planned in detail.  See HLD.

> 
> In the FID HLD I find mention of a object version field within the
> FID,
> which apparently gets incremented with each modification of the file.
> Is that currently implemented in Lustre?

Yes.

>  I'm thinking of the case
> where
> a file is archived, recalled, modified, archived, recalled,
> modified...
> The HSM will need a way to map the correct HSM copies to the correct
> version
> of the file, so hopefully the version field is already supported.

Only one version of a file is present in the file system.  The version is
merely a unique indicator that a file has changed.

> 
> Does Lustre already support snapshot capabilities, or will it in the
> future?
> When a snapshot is made, each archived/purged file within the snapshot
> effectively creates another reference to its copies within the HSM
> database.
> An HSM file copy cannot be removed until it is known that no
> references
> remain to that particular version of the file, either within the live
> filesystem or within any snapshot.  Will the Coordinator be able to
> see the
> snapshot references, and avoid sending delete requests to the HSM
> until all
> snapshot references for a particular file version have been removed?
> Are snapshots read-write or read-only?  If read-only, how to you
> intend to
> have users access purged files in snapshots?

TBD.  The key issue with snapshots is where multiple files in snapshots have
shared blocks.  Dedup in ZFS brings similar issues.

> 
> I haven't been able to figure out how backup/restore works, or will
> work
> in Lustre.  Standard utilities like tar will wreak havoc by triggering
> file recall storms within the HSM.  Better is an intelligent backup
> package
> which understands that the HSM already has multiple copies of the file
> data,
> and so the backup program only needs to back up the metadata.  The
> problem
> here again is that new references to the HSM copies are being created,
> yet those
> references are not visible to the HSM, it doesn't know they exist, so
> methods are needed to ensure that seemly-obsolete HSM copies are not
> deleted before the backups that reference them have also been deleted.
> If you could provide a short description of how you intend backup/
> restore to
> work in combination with an HSM, or if you could provide pointers,
> that would
> be great.

The HSM should have a metadata database to implement "tape side" (as opposed
to file system side) policy. That database might hold all metadata and
manage references.

Examples of such policies are compliance policies (e.g. Delete files from
this year), and backup policies, e.g. retain this or that set of files.  I
expect that like future file systems a new concept of fileset is required to
be very flexible in what policies are applied to.

Rick ...

Peter

> Regards, Kevan
> 
> On Aug 4, 1:06 pm, Peter Braam <Peter.Br... at Sun.COM> wrote:
>> We spoke about the HSM plans some 10 days ago.  I think that the conclusions
>> are roughly as follows:
>> 
>> 1. It is desirable to reach a first implementation as soon as possible.
>> 2. Some design puzzles remain to insure that HSM can keep up with Lutre
>> metadata clusters
>> 
>> The steps to reach a first implementation can be summarized as:
>> 
>> 1. Include file closes in the changelog, if the file was opened for write.
>> Include timestamps in the changelog entries.  This allows the changelog
>> processor to see files that have become inactive and pass them on for
>> archiving.
>> 2. Build an open call that blocks for file retrieval and adapts timeouts to
>> avoid error returns.
>> 3. Until a least-recently-used log is built, use the e2scan utility to
>> generate lists of candidates for purging.
>> 4. Translate events and scan results into a form that they can be understood
>> by ADM.
>> 5. Work with a single coordinator, whose role it is to avoid getting
>> multiple ³close² records for the same file (a basic filter for events).
>> 6. Do not use initiators  these can come later and assist with load
>> balancing and free-ing space on demand (both of which we can ignore for the
>> first release)
>> 7. Do not use multiple agents  the agents can move stripes of files etc,
>> and this is not needed with a basic user level solution, based on consuming
>> the log.  The only thing the agent must do in release one is get the
>> attention of a data mover to restore files on demand.
>> 
>> Peter
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-de... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-
>> devel
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel