[Lustre-devel] SAM-QFS, ADM, and Lustre HSM
Harriet G. Coverston
Harriet.Coverston at Sun.COM
Mon Jan 26 18:26:15 PST 2009
Nathan,
On Jan 26, 2009, at 4:13 PM, Nathaniel Rutman wrote:
> Andreas Dilger wrote:
>> On Jan 23, 2009 13:02 -0600, Rick Matthews wrote:
>>
>>> Having a mover to put data into QFS is a great idea, and can
>>> easily use the QFS Linux client. I don't think you would
>>> necessarily get QFS
>>> policy for native Lustre files unless the "moved" files retained the
>>> Lustre attributes, from which you want policy decisions made.
>>>
>>
>> There will not necessarily be HSM policy data stored with every file
>> from Lustre, though there is a desire to store Lustre layout data in
>> the archive. Is it possible to store extended attributes with each
>> file in QFS?
>>
> We can always store EA's, either natively or "poor-man's EA's" via
> mini-tarballs.
>>
>>> The applicable Lustre namespace would be essentially duplicated in
>>> the
>>> QFS space, and (I think) QFS classification and policy occur on
>>> that name space. Doing so gives you access to rich QFS policy.
>>> This also
>>> allows QFS to migrate data to/from archive media without I/O or
>>> compute load on any Linux clients.
>>>
>>
>> The current Lustre HSM design will not export any of the filesystem
>> namespace to the archive, so that we don't have to track renames in
>> the archive. The archive objects will only be identified by a Lustre
>> FID (128-bit file identifier). IIRC, the HSM-specific copy tool
>> would
>> be given the file name (though not necessarily the full pathname) in
>> order to perform the copyout, but the filesystem will be retrieving
>> the
>> file from the archive by FID. Nathan, can you confirm that is right?
>>
> There is a mechanism to get the current full pathname for a given
> fid from userspace, so an HSM-specific copytool could find it out,
> but a central tenet of the design here is that as far as the HSM is
> concerned, the entire Lustre FS is a flat namespace of FIDs.
Be careful here. We are a file system. We don't have a limit on # of
files in one directory, but we don't recommend more than 500,000 files
in one single directory or you will start to see some performance
problems. You will have to create a tree, not use a flat namespace.
> You can get a full pathname if you want to for catastrophe
> recovery, but Lustre itself will only speak to the HSM with FIDs.
> As I said in the other email, although SAM-QFS can do name-based
> policies, the "name" as far as QFS is concerned is just the FID, so
> name-based policies at the copytool level are worthless. Unless we
> a.) add the path/filename back to the file (EA, or use a tarball
> wrapper), and b.) modify the SAM policy engine to use the "real"
> path/filename instead of the FID.
Currently, we don't support policy using EA (extended attributes are
in 5.0). We have had lots of requests for this, especially from our
digital preservation customers.
>
>
> But in the bigger picture sense, note that all this is simply an
> optimization to allow SAM-QFS filename-based policies, which
> ultimately only influences where SAM-QFS stores files, not whether
> or when the files are archived by Lustre. These "top-level" policy
> decisions are made by the Lustre policy manager, and so perhaps
> there is no real need to spend any effort getting b.) above
> working. Note that a.) is still useful for disaster recovery.
Agree. We have lots of customer with only one archive set. This means
all files are archived with the
same policy -- very simple.
- Harriet
Harriet G. Coverston
Solaris, Storage Software | Email: harriet.coverston at sun.com
Sun Microsystems, Inc. | AT&T: 651-554-1515
1270 Eagan Industrial Rd., Suite 160 | Fax: 651-554-1540
Eagan, MN 55121-1231
More information about the lustre-devel
mailing list