[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Harriet G. Coverston Harriet.Coverston at Sun.COM
Fri Jan 23 08:46:24 PST 2009

On Jan 22, 2009, at 2:46 PM, Nathaniel Rutman wrote:
>>> Integration with SAM-QFS
>>> The SAM policy engine is tightly tied directly to the QFS  
>>> filesystem and for this reason it is not possible to replace the  
>>> HPSS policy engine with SAM.  However, SAM policies could be  
>>> layered in at the copytool level.  The split as we envision it is  
>>> this: existing Lustre policy engine decides which and when files  
>>> should be archived and punched, and SAM-QFS decides how and where  
>>> to archive them. The copytool in this case
>> SAM-QFS already does all these, i.e,  "how and where".
> Yes.  SAM policies would likely have to be written without reference  
> to specific filenames/directories, since that info will not be  
> readily available.  If this proves to be performance-limiting (maybe  
> certain file extensions (.mpg) should be stored in a different  
> manner than another (.txt)), then we can probably find a way to pass  
> the full pathname through to SAM, but this would require SAM code  
> changes.
SAM supports classification policy rules for files  -- (1) number of  
copies, up to 4 (2) where to put the copies  on which vsn pools  -  
disk and/or tape, local and/or remote) (3) when to make the copies  
(time based archiving). You specify the policy in the archiver.cmd  
file. You can group files for a policy rule by pathname, owner, group,  
size, wildcard, and access time.
>>> is simply the unix "cp" command (or perhaps tar as mentioned  
>>> above), that copies the file from the Lustre mount point to the  
>>> QFS mount point on one (of many) clients that has both filesystems  
>>> mounted.  SAM-QFS's file staging and small-file aggregation (as  
>>> well as parallel operation) would all be used "out of the box" to  
>>> provide the best performance possible.
>> The one thing that should be taken into account is that the files  
>> being
>> moved from Lustre to SAM are losing the "age" information.  This  
>> might
>> cause SAM some heartburn because all of the files being added will be
>> considered "new" but there will be a large enough influx of files  
>> that
>> it will need to archive and purge files within hours.

>> It may be that the SAM copytool will need to be modified to allow it
>> to pass on some "age" information (if that is something other than
>> atime and mtime) so the SAM policy engine can treat these files  
>> sensibly.
>> Alternately, it may be that the SAM copytool will need to be smart  
>> enough
>> to mark the new files as "archive & purge immediately" in some  
>> manner.
There is a option to release files from the disk cache after all  
archive copies have been made. You may want to set this in the  
archiver.cmd file. The releasing is done automatically. It depends on  
how you are going to use SAM. If it is just for backup, then, yes, set  
this. However in your mail above, you also are managing your disk  
cache. In this case, it will be faster to retrieve files that are in  
our disk cache.

This brings up the question of restore. In case of a Lustre disk  
failure, how are you going to restore
your Lustre file system?
> We will just use cp -a to preserve timestamps, ownership, perms etc;  
> I don't see what any additional age info could be.  As to the  
> heartburn problem, QFS has disk cache as the first level of archive;  
> as that fills files are moved off to secondary automatically.  We  
> can adjust these watermarks to aggressively move files off to tape.   
> If something backs up, the cp command will simply block.  It would  
> be nice to have some visibility when this situation occurs, but in  
> fact it's not at all clear what we should do besides change our  
> archiving policy.  This is a general issue, not QFS specific.
You will want to set your disk cache thresholds based on the rate of  
influx of data into the disk cache. We default to high 80%, low 70%  
which means when the disk cache reaches 80%, we release the oldest  
archived files until the disk caches reaches 70%. Some of our oil  
customers set the theshold to 60% - 50% because of the heavy influx.  
Of course, if SAM does reach 100%, we block the writers until we have  
space so this is transparent to the application.
>> Again, SAM-QFS already does all of these. Correct?
>> So no code changes are expected at SAM-QFS side, right?
> Correct.  As I see it today, no SAM-QFS code changes are necessary,  
> and the QFS copytool will likely be identical or almost identical to  
> the HPSS copytool.
Agree. I don't see any SAM-QFS code changes required. The Lustre  
copytool will write to HPSS using the HPSS APIs and write to SAM-QFS  
with a ftp or pftp interface. This is minimum changes.
>> For Lustre/SAM-QFS integration, could you point out specifically
>> which area (in this write-up) can be done by U.Minn students?
> I don't actually see any work to be done at this point.  There's the  
> pathname pass-through potential, but I'm not convinced it's at all  
> necessary.
I do see work to switch the HPSS APIs to ftp or pftp. If this is  
already supported by HPSS, then, yes, no changes are required.

    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231

More information about the lustre-devel mailing list