[Lustre-devel] SAM-QFS, ADM, and Lustre HSM
Richard.Matthews at Sun.COM
Fri Jan 23 11:02:36 PST 2009
Nathaniel and all,
Thanks for putting this together.
Having a mover to put data into QFS is a great idea, and can easily
use the QFS Linux client. I don't think
you would necessarily get QFS policy for native Lustre files unless the
"moved" files retained the Lustre attributes,
from which you want policy decisions made. There may be ways to do this.
You would automatically gain the
file gathering of QFS and its efficient tape handling. I also think
there is a "archive then release (purge)"
policy that can be established. The applicable Lustre namespace would be
essentially duplicated in the
QFS space, and (I think) QFS classification and policy occur on that
name space. Doing so gives you
access to rich QFS policy. This also allows QFS to migrate data to/from
archive media without I/O or
compute load on any Linux clients.
Nathaniel Rutman wrote:
> (adding lustre-devel, dropping Bojanic from distro list; if anyone
> else wants off, let me know.)
> Hua Huang and Andreas wrote:
>> Thanks for the write-up. A few questions and comments.
>> SAM-QFS only runs on Solaris, so it is always
>> remotely mounted on Lustre client via network connection,
> QFS has a Linux native client
> So the copy nodes would be linux nodes acting as clients for both
> Lustre and QFS. This would generally result in two network hops for
> the data, but by placing the clients on OST nodes and having the
> coordinator choose wisely, we can probably save one of the network
> hops most of the time. This may or may not be a good idea, depending
> on the load imposed on the OST. The copytool would also require us to
> pump the data from kernel to userspace and back, potentially resulting
> in significant bus loading. We could memory map the Lustre side
>> Nathaniel Rutman wrote:
>>> Hi all -
>>> So we all have a common starting point, I'm going to jump right in
>>> and describe the current plan for integrating Lustre's HSM feature
>>> (in development) with SAM-QFS and ADM.
>>> HSM for Lustre can be broken into two major components, both of
>>> which will live in userspace: the policy engine, which decides when
>>> files are archived (copy to (logical) tape), punched (removed from
>>> OSTs), or deleted; and the copytool, which moves file data to and
>>> from tape. A third component that we call the coordinator lives in
>>> kernel space and is responsible for relaying HSM requests to various
>>> client nodes.
>> s/tape/the archive/
> yes, I knew my "(logical) tape" statement needed to be clarified :)
>>> The policy engine collects filesystem info, maintains a database of
>>> files it is interested in, and makes archive and punch decisions
>>> that are then communicated back to Lustre. Note that the database
>>> is only used to make policy decisions, and is specifically _not_ a
>>> database of file/storage location information. Periodically, the
>>> policy engine give a list of file identifiers and operations (via
>>> the coordinator) to any number of Lustre clients running copytools.
>> This work will be done by CEA as part of the HPSS HSM solution.
>> This work is generic in the sense that it could be SAM-QFS or any
>> other tape backend on the remote side for archival, right?
> Yes. The issue here is that the policy engine is a big part of
> "brains" of the HSM, and could be a key differentiator for customers.
> That's why the ADM integration would likely replace the HPSS policy
> engine with ADM's Event Manager -- presumably we'll be able to get
> enhanced features by doing this. The actual benefits need to be
>> Is it expected that a given copytool would be given multiple files to
>> archive at one time? This would allow optimizing the archiving
>> to e.g. aggregate small files into a single archive object, but would
>> make identifying and extracting these files from the aggregate harder.
> I do expect the coordinator to hand a list of files to each
> copytool. But SAM-QFS would actually handle small file aggregation
> "underneath" the copytool itself; we don't have to worry about
>>> The copytool will take the list of files and perform the requested
>>> operation: archive, delete, or restore. (It is potentially possible
>>> to have finer-grained archive commands passed from the policy
>>> engine, e.g. archive_level_3.) It will then copy the files off to
>>> tape/storage using whatever hardware/software specific commands are
>>> necessary. Note that the file identifiers are opaque 16-byte
>>> strings. Files are requested using the same identifiers; "paths may
>>> change, but the fids remain the same" is the basic philosophy. The
>>> copytool may hash the fids into dirs/subdirs to relieve problems
>>> with a flat namespace, but this is invisible to Lustre. Having said
>>> that, additional information such as the full path name, EAs, etc.
>>> may be added by the copytool (using a tar wrapper, for example), for
>>> disaster recovery or striping recovery.
>>> The initial version of the copytool and policy engine will be
>>> written targeted for HPSS, but it is likely that the SAM-QFS
>>> integration will use the same pieces. Perhaps calling it the
>>> "Lustre policy engine" would be more appropriate.
>> So the initial version will be done by CEA as part of the HPSS.
> Part of the "HPSS-compatible Lustre HSM solution", which is our
> initial target, yes.
>> You mentioned other details above, which can be SAM_QFS specific?
>> I am trying to figure out if the full-version of copy-tool used in
>> Lustre/SAM_QFS integration will be implemented specifically for SAM-QFS
>> from the Lustre side.
> There are two items that I can think of that may be archive-specific
> 1. hash the fids into dirs/subdirs to avoid a big flat namespace
> 2. inclusion of file extended attributes (EAs)
> But in fact, I don't know enough about HPSS to say we don't need these
> items anyhow. CEA, can you comment?
> I think current versions of HPSS are able to store EAs automatically,
> and QFS is not, so that may be one difference.
>>> Integration with SAM-QFS
>>> The SAM policy engine is tightly tied directly to the QFS filesystem
>>> and for this reason it is not possible to replace the HPSS policy
>>> engine with SAM. However, SAM policies could be layered in at the
>>> copytool level. The split as we envision it is this: existing
>>> Lustre policy engine decides which and when files should be archived
>>> and punched, and SAM-QFS decides how and where to archive them. The
>>> copytool in this case
>> SAM-QFS already does all these, i.e, "how and where".
> Yes. SAM policies would likely have to be written without reference
> to specific filenames/directories, since that info will not be readily
> available. If this proves to be performance-limiting (maybe certain
> file extensions (.mpg) should be stored in a different manner than
> another (.txt)), then we can probably find a way to pass the full
> pathname through to SAM, but this would require SAM code changes.
>>> is simply the unix "cp" command (or perhaps tar as mentioned above),
>>> that copies the file from the Lustre mount point to the QFS mount
>>> point on one (of many) clients that has both filesystems mounted.
>>> SAM-QFS's file staging and small-file aggregation (as well as
>>> parallel operation) would all be used "out of the box" to provide
>>> the best performance possible.
>> The one thing that should be taken into account is that the files being
>> moved from Lustre to SAM are losing the "age" information. This might
>> cause SAM some heartburn because all of the files being added will be
>> considered "new" but there will be a large enough influx of files that
>> it will need to archive and purge files within hours.
>> It may be that the SAM copytool will need to be modified to allow it
>> to pass on some "age" information (if that is something other than
>> atime and mtime) so the SAM policy engine can treat these files
>> Alternately, it may be that the SAM copytool will need to be smart
>> to mark the new files as "archive & purge immediately" in some manner.
> We will just use cp -a to preserve timestamps, ownership, perms etc; I
> don't see what any additional age info could be. As to the heartburn
> problem, QFS has disk cache as the first level of archive; as that
> fills files are moved off to secondary automatically. We can adjust
> these watermarks to aggressively move files off to tape. If something
> backs up, the cp command will simply block. It would be nice to have
> some visibility when this situation occurs, but in fact it's not at
> all clear what we should do besides change our archiving policy. This
> is a general issue, not QFS specific.
>> Again, SAM-QFS already does all of these. Correct?
>> So no code changes are expected at SAM-QFS side, right?
> Correct. As I see it today, no SAM-QFS code changes are necessary,
> and the QFS copytool will likely be identical or almost identical to
> the HPSS copytool.
>> For Lustre/SAM-QFS integration, could you point out specifically
>> which area (in this write-up) can be done by U.Minn students?
> I don't actually see any work to be done at this point. There's the
> pathname pass-through potential, but I'm not convinced it's at all
>>> Integration with ADM
>>> ADM's event manager would replace the HPSS policy engine. It would
>>> need some minor modifications to be integrated with the Lustre
>>> changelogs (instead of DMAPI) and ioctl interface to the
>>> coordinator. It also produces a similar list of files and actions.
>>> The ADM core would be the copytool, consuming the list and sending
>>> files to tape. We would also need a bit of work to pass
>>> communications between ADM's Archive Information Manager and the
>>> policy engine and copytools. ADM integration is dependent upon
>>> having a Linux ADM implementation, or a Solaris Lustre
>>> implementation (potentially Lustre client only).
>>> Feel free to question, correct, criticize.
Rick Matthews email: Rick.Matthews at sun.com
Sun Microsystems, Inc. phone:+1(651) 554-1518
1270 Eagan Industrial Road phone(internal): 54418
Suite 160 fax: +1(651) 554-1540
Eagan, MN 55121-1231 USA main: +1(651) 554-1500
More information about the lustre-devel