[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Harriet G. Coverston Harriet.Coverston at Sun.COM
Mon Jan 26 18:26:15 PST 2009


Nathan,

On Jan 26, 2009, at 4:13 PM, Nathaniel Rutman wrote:

> Andreas Dilger wrote:
>> On Jan 23, 2009  13:02 -0600, Rick Matthews wrote:
>>
>>> Having a mover to put data into QFS is a great idea, and can  
>>> easily use the QFS Linux client. I don't think you would  
>>> necessarily get QFS
>>> policy for native Lustre files unless the "moved" files retained the
>>> Lustre attributes, from which you want policy decisions made.
>>>
>>
>> There will not necessarily be HSM policy data stored with every file
>> from Lustre, though there is a desire to store Lustre layout data in
>> the archive.  Is it possible to store extended attributes with each
>> file in QFS?
>>
> We can always store EA's, either natively or "poor-man's EA's" via  
> mini-tarballs.
>>
>>> The applicable Lustre namespace would be essentially duplicated in  
>>> the
>>> QFS space, and (I think) QFS classification and policy occur on  
>>> that  name space. Doing so gives you access to rich QFS policy.  
>>> This also
>>> allows QFS to migrate data to/from archive media without I/O or
>>> compute load on any Linux clients.
>>>
>>
>> The current Lustre HSM design will not export any of the filesystem
>> namespace to the archive, so that we don't have to track renames in
>> the archive.  The archive objects will only be identified by a Lustre
>> FID (128-bit file identifier).  IIRC, the HSM-specific copy tool  
>> would
>> be given the file name (though not necessarily the full pathname) in
>> order to perform the copyout, but the filesystem will be retrieving  
>> the
>> file from the archive by FID.  Nathan, can you confirm that is right?
>>
> There is a mechanism to get the current full pathname for a given  
> fid from userspace, so an HSM-specific copytool could find it out,  
> but a central tenet of the design here is that as far as the HSM is  
> concerned, the entire Lustre FS is a flat namespace of FIDs.

Be careful here. We are a file system. We don't have a limit on # of  
files in one directory, but we don't recommend more than 500,000 files  
in one single directory or you will start to see some performance  
problems. You will have to create a tree, not use a flat namespace.

>  You can get a full pathname if you want to for catastrophe  
> recovery, but Lustre itself will only speak to the HSM with FIDs.
> As I said in the other email, although SAM-QFS can do name-based  
> policies, the "name" as far as QFS is concerned is just the FID, so   
> name-based policies at the copytool level are worthless.   Unless we  
> a.) add the path/filename back to the file (EA, or use a tarball  
> wrapper), and b.) modify the SAM policy engine to use the "real"  
> path/filename instead of the FID.

Currently, we don't support policy using EA (extended attributes are  
in 5.0). We have had lots of requests for this, especially from our  
digital preservation customers.
>
>
> But in the bigger picture sense, note that all this is simply an  
> optimization to allow SAM-QFS filename-based policies, which  
> ultimately only influences where SAM-QFS stores files, not whether  
> or when the files are archived by Lustre.  These "top-level" policy  
> decisions are made by the Lustre policy manager, and so perhaps  
> there is no real need to spend any effort getting b.) above  
> working.  Note that a.) is still useful for disaster recovery.
Agree. We have lots of customer with only one archive set. This means  
all files are archived with the
same policy -- very simple.

    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231







More information about the lustre-devel mailing list