[Lustre-devel] SAM-QFS, ADM, and Lustre HSM
Harriet G. Coverston
Harriet.Coverston at Sun.COM
Mon Feb 2 07:07:54 PST 2009
Colin,
On Feb 2, 2009, at 8:56 AM, Colin Ngam wrote:
>>>
>>>
>>> Harriet G. Coverston wrote:
>>>>> There is a mechanism to get the current full pathname for a
>>>>> given fid from userspace, so an HSM-specific copytool could find
>>>>> it out, but a central tenet of the design here is that as far as
>>>>> the HSM is concerned, the entire Lustre FS is a flat namespace
>>>>> of FIDs.
>>>>
>>>> Be careful here. We are a file system. We don't have a limit on #
>>>> of files in one directory, but we don't recommend more than
>>>> 500,000 files in one single directory or you will start to see
>>>> some performance problems. You will have to create a tree, not
>>>> use a flat namespace.
>>> Yes, a tree based on a hash of the fid.
>>> The other option is to use the actual filename for storage, but
>>> from Lustre's point of view this gets extremely tricky. For
>>> example:
>>> Send /foo/bar to archive. Client A opens /foo/bar. Client B
>>> renames /foo/bar to /abc/xyz, but this change hasn't propagated to
>>> the archive yet. Client A now tries to read its open file handle,
>>> which tells Lustre to read the offline file FID 123, which it
>>> translates to /abc/xyz currently, which the archive doesn't know
>>> about yet. Not just xyz, but renames on any ancestor path element
>>> cause similar misses. Since the FID remains constant throughout
>>> the life of a file, we don't have to worry about any namespace
>>> changes (file or parents). If there was an alternate way of
>>> bypassing the archive's namespace to directly access a file, we
>>> could conceivably store e.g. an archive-specific identifier within
>>> the Lustre stripe EA, and pass this down to the copytool when
>>> reading an offline file, but this presupposes that such a thing
>>> exists, is of reasonable size, has a userspace method to access
>>> it, etc.
>>
>> Yes, we have a FID like concept in SAM-QFS. It is called the file
>> ID. It is 64 bits and consists of the inode/generation number. It
>> is unique. You can store it. You can issue an ioctl to open the ID.
>> You
>> can issue an ioctl to do an ID stat, etc. It is much more efficient
>> than using the filename (expensive lookup). This means if you store
>> and use the ID, you can cover the rename window and still be
>> guaranteed that you will get the right file. Note, we don't
>> rearchive on a rename.
> I believe this facility only exist on the Meta Data Server Node and
> not on the Linux/Solaris clients. Am I correct?
It is supported on the MDS and the Solaris client nodes, but currently
not on Linux.
I thought about this a bit. After we do a samfsrestore (reload the
metadata after a crash of the SAM-QFS disk cache), the ID is not the
same. Therefore, you would not be able to use this after a SAM restore
unless
the ID that you are storing is updated. We really need to think about
this.
- Harriet
>
>
> Thanks.
>
> colin
>>
>> I really think a replicated namespace will be much more intuitive
>> and solves restore. If you prefer
>> to build a tar container, that is OK, too. The tar file can have a
>> suffix and then you know it is tar and
>> you can tar it back.
>>>
>>>
>>>>
>>>>> You can get a full pathname if you want to for catastrophe
>>>>> recovery, but Lustre itself will only speak to the HSM with FIDs.
>>>>> As I said in the other email, although SAM-QFS can do name-based
>>>>> policies, the "name" as far as QFS is concerned is just the FID,
>>>>> so name-based policies at the copytool level are worthless.
>>>>> Unless we a.) add the path/filename back to the file (EA, or use
>>>>> a tarball wrapper), and b.) modify the SAM policy engine to use
>>>>> the "real" path/filename instead of the FID.
>>>>
>>>> Currently, we don't support policy using EA (extended attributes
>>>> are in 5.0). We have had lots of requests for this, especially
>>>> from our digital preservation customers.
>>> Ah, policy based on EAs would be the general case, yes.
>> Yes, this would be a nice feature for us.
>>
>> - Harriet
>>
>> Harriet G. Coverston
>> Solaris, Storage Software | Email: harriet.coverston at sun.com
>> Sun Microsystems, Inc. | AT&T:
>> 651-554-1515
>> 1270 Eagan Industrial Rd., Suite 160 | Fax: 651-554-1540
>> Eagan, MN 55121-1231
>>
>>
>>
>>
>
- Harriet
Harriet G. Coverston
Solaris, Storage Software | Email: harriet.coverston at sun.com
Sun Microsystems, Inc. | AT&T: 651-554-1515
1270 Eagan Industrial Rd., Suite 160 | Fax: 651-554-1540
Eagan, MN 55121-1231
More information about the lustre-devel
mailing list