[Lustre-devel] Wide striping

Eric Barton eeb at whamcloud.com
Wed Oct 5 11:02:22 PDT 2011


Shadow,

Your comment describes create-on-write (CROW), which is vulnerable
to orphan creation by clients which have been evicted from the MDS
but are not actually dead, unless further safeguards are implemented
such as capabilities or server-cluster-wide client eviction.

I also think that the decision to use FIDs in the way you suggest
has architectural implications which would benefit from further
discussion.  The original idea was that a FID would be all you need
to identify any object (including its target) and that using them
uniformly in this way could help simplify the code and enable further
development - e.g. to allow unified targets which mix namespace and
data objects to better support small/sparse files.

Making the FID just a unique identifier which requires a target index
to specify a specific object doesn't have to be inconsistent with
uniform usage for data and metadata, but it has further knock-on
implications which must be acknowledged and debated explicitly
before we go further.  We really must be confident we've thought
through all the consequences of our architectural decisions before
we invest development effort in them.  It's just too expensive to
reverse a bad decision otherwise.

          Cheers,
                   Eric

> -----Original Message-----
> From: lustre-devel-bounces at lists.lustre.org [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf
> Of Alexey Lyashkov
> Sent: 05 October 2011 10:29 AM
> To: wangdi
> Cc: Alexander Boyko; Lustre Development Mailing List; Artem Blagodarenko; Nathan Rutman
> Subject: Re: [Lustre-devel] Wide striping
> 
> Hi All,
> >
> > FID-on-OST is actually part of DNE(dirtribute name space) phase I.  It basically follows current fid
> client server infrastructure.
> >
> > 1. MDT is the fid client, which requests fid from the OST and allocates fids for the object during
> pre-creation.
> > 2. OST is the fid server, which will allocate the FIDs to MDTs and requests super fid sequence from
> fid control server (root MDT).
> > 3. Similar as MDT FID, there will be OI to map FID to object inside OST.
> >
> > The code will be release with DNE sometime next year.
> >
> I think we not need a special FID's for OST object, except we want to migrate one object via different
> data containers over cluster.
> I think it's not a priority for now.
> So we can simplify a FID management for OST now.
> Each data object may identified via pair {OST_INDEX / OST_UUID, MDT_FID}.
> In that case OST not need allocate any FID's, and MDT can reuse current reallocation scheme.
> in fact we not need a assign a FID for OST object in file creation time (aka creating LSM), but we
> need a guaranteed free OST object exist when client tried to make access to that object.
> in that case OST can preallocate some pool and report that size to MDT,
> MDT know it's uses some objects from that pool, but not know which object id assigned to file.
> to avoid OST confusion client send a MDT FID to OST when need access to OST object.
> OST look to OI database and check - is that FID assigned to something or not.
> if assigned - IO will return a inode, otherwise OST need to grab any free object from a pool and
> assign to that FID.
> that's all.
> 
> orphan cleanup not need to be changed in that case - MDT send a last allocated objid, and OST will
> kill a unallocated objects and return last index to the MDT.
> open-unlink case need to be changed to put a fid in LLOG record and OST need to be changed to handle
> FID as object index.
> 
> 
> 
> --------------------------------------------
> Alexey Lyashkov
> alexey_lyashkov at xyratex.com
> 
> 
> 
> 
> ______________________________________________________________________
> This email may contain privileged or confidential information, which should only be used for the
> purpose for which it was sent by Xyratex. No further rights or licenses are granted to use such
> information. If you are not the intended recipient of this message, please notify the sender by return
> and delete it. You may not use, copy, disclose or rely on the information contained in it.
> 
> Internet email is susceptible to data corruption, interception and unauthorised amendment for which
> Xyratex does not accept liability. While we have taken reasonable precautions to ensure that this
> email is free of viruses, Xyratex does not accept liability for the presence of any computer viruses
> in this email, nor for any losses caused as a result of viruses.
> 
> Xyratex Technology Limited (03134912), Registered in England & Wales, Registered Office, Langstone
> Road, Havant, Hampshire, PO9 1SA.
> 
> The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, Xyratex
> International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd registered in Malaysia,
> Xyratex Technology (Wuxi) Co Ltd registered in The People's Republic of China and Xyratex Japan
> Limited registered in Japan.
> ______________________________________________________________________
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel




More information about the lustre-devel mailing list