[Lustre-devel] Wide striping

Nathan Rutman Nathan_Rutman at xyratex.com
Wed Oct 5 11:44:07 PDT 2011


On Oct 5, 2011, at 11:02 AM, Eric Barton wrote:

> Shadow,
> 
> Your comment describes create-on-write (CROW), which is vulnerable
> to orphan creation by clients which have been evicted from the MDS
> but are not actually dead, unless further safeguards are implemented
> such as capabilities or server-cluster-wide client eviction.
create-on-write isn't really an integral part of this design, just a side thought.  Let's leave it out of the 
discussion for now.
> 
> I also think that the decision to use FIDs in the way you suggest
> has architectural implications which would benefit from further
> discussion.  The original idea was that a FID would be all you need
> to identify any object (including its target) and that using them
> uniformly in this way could help simplify the code and enable further
> development - e.g. to allow unified targets which mix namespace and
> data objects to better support small/sparse files.
> 
> Making the FID just a unique identifier which requires a target index
> to specify a specific object doesn't have to be inconsistent with
> uniform usage for data and metadata, but it has further knock-on
> implications which must be acknowledged and debated explicitly
> before we go further.  We really must be confident we've thought
> through all the consequences of our architectural decisions before
> we invest development effort in them.  It's just too expensive to
> reverse a bad decision otherwise.
That's what we're trying to do now :)

The issue as I see it is that we're thinking about a feature that could be useful 
today, and is implementable today, except for the fact that there are some 
longer term plans that might conflict.  Our wide-striping could be implemented
on top of WC's future FID-on-OST plans -- but would require that code to exist.
So then we have to decide if waiting is the best option, or whether a more 
minimal change (probably the "common object ID" from my original arch email)
could land first, and then DNE FID-on-OST could change it later. 


> 
>          Cheers,
>                   Eric
> 
>> -----Original Message-----
>> From: lustre-devel-bounces at lists.lustre.org [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf
>> Of Alexey Lyashkov
>> Sent: 05 October 2011 10:29 AM
>> To: wangdi
>> Cc: Alexander Boyko; Lustre Development Mailing List; Artem Blagodarenko; Nathan Rutman
>> Subject: Re: [Lustre-devel] Wide striping
>> 
>> Hi All,
>>> 
>>> FID-on-OST is actually part of DNE(dirtribute name space) phase I.  It basically follows current fid
>> client server infrastructure.
>>> 
>>> 1. MDT is the fid client, which requests fid from the OST and allocates fids for the object during
>> pre-creation.
>>> 2. OST is the fid server, which will allocate the FIDs to MDTs and requests super fid sequence from
>> fid control server (root MDT).
>>> 3. Similar as MDT FID, there will be OI to map FID to object inside OST.
>>> 
>>> The code will be release with DNE sometime next year.
>>> 
>> I think we not need a special FID's for OST object, except we want to migrate one object via different
>> data containers over cluster.
>> I think it's not a priority for now.
>> So we can simplify a FID management for OST now.
>> Each data object may identified via pair {OST_INDEX / OST_UUID, MDT_FID}.
>> In that case OST not need allocate any FID's, and MDT can reuse current reallocation scheme.
>> in fact we not need a assign a FID for OST object in file creation time (aka creating LSM), but we
>> need a guaranteed free OST object exist when client tried to make access to that object.
>> in that case OST can preallocate some pool and report that size to MDT,
>> MDT know it's uses some objects from that pool, but not know which object id assigned to file.
>> to avoid OST confusion client send a MDT FID to OST when need access to OST object.
>> OST look to OI database and check - is that FID assigned to something or not.
>> if assigned - IO will return a inode, otherwise OST need to grab any free object from a pool and
>> assign to that FID.
>> that's all.
>> 
>> orphan cleanup not need to be changed in that case - MDT send a last allocated objid, and OST will
>> kill a unallocated objects and return last index to the MDT.
>> open-unlink case need to be changed to put a fid in LLOG record and OST need to be changed to handle
>> FID as object index.
>> 
>> 
>> 
>> --------------------------------------------
>> Alexey Lyashkov
>> alexey_lyashkov at xyratex.com
>> 
>> 
>> 
>> 
>> ______________________________________________________________________
>> This email may contain privileged or confidential information, which should only be used for the
>> purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such
>> information. If you are not the intended recipient of this message, please notify the sender by return
>> and delete it. You may not use, copy, disclose or rely on the information contained in it.
>> 
>> Internet email is susceptible to data corruption, interception and unauthorised amendment for which
>> Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this
>> email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses
>> in this email, nor for any losses caused as a result of viruses.
>> 
>> Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone
>> Road, Havant, Hampshire, PO9 1SA.
>> 
>> The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex
>> International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia,
>> Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan
>> Limited registered in Japan.
>> ______________________________________________________________________
>> 
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
______________________________________________________________________
This email may contain privileged or confidential information, which should only be used for the purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such information. If you are not the intended recipient of this message, please notify the sender by return and delete it. You may not use, copy, disclose or rely on the information contained in it.
 
Internet email is susceptible to data corruption, interception and unauthorised amendment for which Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses in this email, nor for any losses caused as a result of viruses.
 
Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone Road, Havant, Hampshire, PO9 1SA.
 
The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan Limited registered in Japan.
______________________________________________________________________
 




More information about the lustre-devel mailing list