[Lustre-devel] SAM-QFS, ADM, and Lustre HSM

Harriet G. Coverston Harriet.Coverston at Sun.COM
Sun Feb 1 20:00:22 PST 2009


Nathan,

On Jan 30, 2009, at 6:21 PM, Nathaniel Rutman wrote:

> LEIBOVICI Thomas wrote:
>> At CEA, we are using our own copytool that directly uses HPSS API.  
>> This already exists and is in production for years.
>> I think there will be few modifications to adapt it to Lustre-HSM  
>> purpose
>> (basically, add fid <-> HSM id mapping and backup of attributes,  
>> path, stripe...)
> So then the QFS copytool will indeed be a new tool, and should be  
> scheduled accordingly.
> Features:
> 1. "cp --preserve" like functionality (include metadata attributes  
> in cp)
> 2. add EA's (create mini-tarball)
> 3. implement FID hash to subdivide namespace
> 4. periodic status reporting (via ioctl on file)
>
>
> Harriet G. Coverston wrote:
>>> There is a mechanism to get the current full pathname for a given  
>>> fid from userspace, so an HSM-specific copytool could find it out,  
>>> but a central tenet of the design here is that as far as the HSM  
>>> is concerned, the entire Lustre FS is a flat namespace of FIDs.
>>
>> Be careful here. We are a file system. We don't have a limit on #  
>> of files in one directory, but we don't recommend more than 500,000  
>> files in one single directory or you will start to see some  
>> performance problems. You will have to create a tree, not use a  
>> flat namespace.
> Yes, a tree based on a hash of the fid.
> The other option is to use the actual filename for storage, but from  
> Lustre's point of view this gets extremely tricky.  For example:
> Send /foo/bar to archive.  Client A opens /foo/bar.  Client B  
> renames /foo/bar to /abc/xyz, but this change hasn't propagated to  
> the archive yet.  Client A now tries to read its open file handle,  
> which tells Lustre to read the offline file FID 123, which it  
> translates to /abc/xyz currently, which the archive doesn't know  
> about yet.  Not just xyz, but renames on any ancestor path element  
> cause similar misses.  Since the FID remains constant throughout the  
> life of a file, we don't have to worry about any namespace changes  
> (file or parents).  If there was an alternate way of bypassing the  
> archive's namespace to directly access a file, we could conceivably  
> store e.g. an archive-specific identifier within the Lustre stripe  
> EA, and pass this down to the copytool when reading an offline file,  
> but this presupposes that such a thing exists, is of reasonable  
> size, has a userspace method to access it, etc.

Yes, we have a FID like concept in SAM-QFS. It is called the file ID.  
It is 64 bits and consists of the inode/generation number. It is  
unique. You can store it. You can issue an ioctl to open the ID. You
can issue an ioctl to do an ID stat, etc. It is much more efficient  
than using the filename (expensive lookup). This means if you store  
and use the ID, you can cover the rename window and still be  
guaranteed that you will get the right file. Note, we don't rearchive  
on a rename.

I really think a replicated namespace will be much more intuitive and  
solves restore. If you prefer
to build a tar container, that is OK, too. The tar file can have a  
suffix and then you know it is tar and
you can tar it back.
>
>
>>
>>> You can get a full pathname if you want to for catastrophe  
>>> recovery, but Lustre itself will only speak to the HSM with FIDs.
>>> As I said in the other email, although SAM-QFS can do name-based  
>>> policies, the "name" as far as QFS is concerned is just the FID,  
>>> so  name-based policies at the copytool level are worthless.    
>>> Unless we a.) add the path/filename back to the file (EA, or use a  
>>> tarball wrapper), and b.) modify the SAM policy engine to use the  
>>> "real" path/filename instead of the FID.
>>
>> Currently, we don't support policy using EA (extended attributes  
>> are in 5.0). We have had lots of requests for this, especially from  
>> our digital preservation customers.
> Ah, policy based on EAs would be the general case, yes.
Yes, this would be a nice feature for us.

    - Harriet

Harriet G. Coverston
Solaris, Storage Software			 |  Email: harriet.coverston at sun.com
Sun Microsystems, Inc.                     	 |  AT&T:  651-554-1515
1270 Eagan Industrial Rd., Suite 160       |  Fax:   651-554-1540
Eagan, MN 55121-1231







More information about the lustre-devel mailing list