[Lustre-devel] Lustre dcache clean up and distributed locking

tao.peng at emc.com tao.peng at emc.com
Wed Feb 22 19:13:12 PST 2012


Hi all,

I cleaned up the design a bit more to reflect the changes in Siyao's latest dcache patch. Some of the previous mentioned dcache cleanups are removed because they were already addressed in the dcache scalability patch. And now the remaining task is mainly about hiding DCACHE_LUSTRE_INVALID flag. Please see the attachment for details. I also uploaded it to LU-1113.

Cheers,
Tao


> -----Original Message-----
> From: Peng Tao [mailto:bergwolf at gmail.com]
> Sent: Sunday, February 12, 2012 11:46 AM
> To: Yong Fan
> Cc: Peng, Tao; adilger at whamcloud.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> devel at lists.lustre.org
> Subject: Re: Lustre dcache clean up and distributed locking
>
> Hi Yong,
>
> On Sat, Feb 11, 2012 at 11:58 PM, Yong Fan <yong.fan at whamcloud.com> wrote:
> > Hi Tao,
> >
> > The new design looks OK me.
> Thanks for reviewing!
>
> > BTW, you have removed the section about "ll_lookup_lock" from the first
> > version document. So you will not change related logic, right? I think this
> > part should be fixed in new kernels.
> "ll_lookup_lock" is removed in Siyao's latest dcache patch and
> replaced by inode lock. So I removed the section about
> "ll_lookup_lock" from the design and I will not change related logic.
>
> Cheers,
> Tao
> >
> > Cheers,
> > Nasf
> >
> >
> > On Thu, Feb 9, 2012 at 3:15 PM, <tao.peng at emc.com> wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> > Sent: Saturday, February 04, 2012 12:18 AM
> >> > To: Peng, Tao
> >> > Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish,
> >> > sorin; Liu, Xuezhao;
> >> > laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
> >> > bryon at whamcloud.com; lustre-
> >> > devel at lists.lustre.org
> >> > Subject: Re: Lustre dcache clean up and distributed locking
> >> >
> >> > On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
> >> > >> -----Original Message-----
> >> > >> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> > >> Sent: Tuesday, January 31, 2012 8:02 PM
> >> > >> To: Peng, Tao
> >> > >> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish,
> >> > >> sorin; Liu, Xuezhao;
> >> > >> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
> >> > >> bryon at whamcloud.com; lustre-
> >> > >> devel at lists.lustre.org
> >> > >> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>
> >> > >> On 1/31/12 7:27 AM, Andreas Dilger wrote:
> >> > >>> On 2012-01-30, at 12:46 AM,<tao.peng at emc.com>   wrote:
> >> > >>>> Sorry for the late reply. I got your email earlier but didn't have
> >> > >>>> chance to read the document
> >> > due
> >> > >> to very limited internet access.
> >> > >>>> Thank you very much for taking detailed look at the document and
> >> > >>>> giving so many valuable comments.
> >> > >>>>
> >> > >>>> About adding new inode bitlock, I must admit that I missed hardlink
> >> > >>>> case. Although the lock mode
> >> > >> can be changed to be the same as LOOKUP lock, I agree that inode
> >> > >> bitlock cannot sufficiently
> >> > protect
> >> > >> every name/dentry separately for hardlinked files. And I agree, as
> >> > >> you suggested in the document,
> >> > >> adding a name entry based lock can solve it.
> >> > >>>> IIUC, to add a name entry based lock, both client and server should
> >> > >>>> be modified, and LDLM need to
> >> > >> be changed to manage the new lock information. Like new inode
> >> > >> bitlock, the name entry based lock
> >> > will
> >> > >> as well introduce interoperability issues as a result of on wire
> >> > >> protocol change. And in DNE
> >> > >> environment as Andreas explained, the original plan is to get LOOKUP
> >> > >> on both master MDT and remote
> >> > MDT.
> >> > >> If we introduce name entry based lock, DNE case may need some change
> >> > >> as well.
> >> > >>>> I will take a closer look at the approach. In the meanwhile, could
> >> > >>>> you please confirm if it is
> >> > the
> >> > >> right way to go? The benefit of above complexity seems to be better
> >> > >> distributed dentry locking
> >> > under
> >> > >> dcache framework.
> >> > >>>> What do others think? Andreas?
> >> > >>> Tao, before we go down the road of implementing such a change, there
> >> > >>> would have to be clear
> >> > benefits
> >> > >> to the performance and/or kernel integration before we could accept
> >> > >> the interoperability issues and
> >> > >> added code complexity.
> >> > >>> The benefits of having a per-name lock, as I see them, are:
> >> > >>> - can be cancelled separately from other name locks
> >> > >>> - can be changed independently from the inode permissions (LOOKUP)
> >> > >>> - potential ease of integration with the Linux dcache
> >> > >>>
> >> > >>> The drawbacks of introducing a per-name lock:
> >> > >>> - only useful for hard-linked files (uncommon case)
> >> > >>> - added code complexity for a new lock type
> >> > >>> - incompatibility with existing client/server
> >> > >>> - additional code to handle incompatibility transparently
> >> > >>> - lock enqueue may need extra RPC for each name lock (different
> >> > >>> resource)
> >> > >>> - lock cancellation may need an extra RPC for each resource
> >> > >>>
> >> > >>> The way I see it is that the DENTRY lock would need to be a
> >> > >>> different resource for each name on a
> >> > >> file.  The only way this is going to be useful AFAICS is if the
> >> > >> client enqueues each DENTRY lock on
> >> > a
> >> > >> separate resource, so that it can be granted and cancelled
> >> > >> independently of the LOOKUP lock (which
> >> > is
> >> > >> on the FID and would be common for all names that reference this
> >> > >> inode).  That would either need an
> >> > >> extra RPC for every filename lookup (to enqueue both the DENTRY and
> >> > >> LOOKUP locks) or a new
> >> > >> LDLM_ENQUEUE RPC that can request and be granted two different locks
> >> > >> (FID [seq, oid, ver, 0], and
> >> > >> FID+name hash [seq, oid, ver, hash]) in a single RPC.
> >> > >>> So, the main question is how much better the client dcache/VFS
> >> > >>> integration will be with the DCACHE
> >> > >> lock compared to the existing mechanism with the LOOKUP (FID) lock?
> >> > >>  Is it impossible (or very
> >> > >> difficult) to use the existing mechanism for the new lockless dcache
> >> > >> code?  Or is it a matter of
> >> > this
> >> > >> being a somewhat cleaner implementation, but not critical for moving
> >> > >> forward?
> >> > >>> I'm aware that our dcache code has been too complex for some time,
> >> > >>> but I haven't looked recently
> >> > to
> >> > >> see how much of this is due to using/supporting old kernel APIs, and
> >> > >> how much of it is due to the
> >> > use
> >> > >> of the LOOKUP lock on the FID protecting too much of the state on the
> >> > >> client.
> >> > >>> Cheers, Andreas
> >> > >> Hi Tao,
> >> > >>
> >> > >> To introduce either per-name lock or DENTRY lock is mainly for
> >> > >> removing
> >> > >> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code.
> >> > >> But
> >> > >> if such simplification causes more complex ldlm logic, we have to
> >> > >> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
> >> > >> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux
> >> > >> kernel.
> >> > >> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
> >> > >> lockless dcache in new linux kernel also.
> >> > >>
> >> > >> So to push the project move forward, we can divide it as two parts
> >> > >> (according to your document): Lustre dcache code cleanup, and some
> >> > >> new
> >> > >> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
> >> > >> agreement on the second part, we can start from the first part to
> >> > >> cleanup Lustre dcache code. How do you think?
> >> > >>
> >> > > Thank you, Andreas and Yong. I tend to agree with you that we should
> >> > > avoid the complexity if there
> >> > is no obvious benefits. However, as we need to make sure the code is
> >> > acceptable by kernel community, I
> >> > will try to consolidate a design for adding the new type of lock but
> >> > just don't implement it at this
> >> > point. And if later we are asked (by kernel community developers) to do
> >> > it, we just need to do the
> >> > implementation.
> >> >
> >> > Hi Tao,
> >> >
> >> > We has another way to remove Lustre special dentry flags
> >> > "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
> >> > lock protected but not unhashed yet, we can add new flags in
> >> > "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
> >> > private date, no need to worry about whether the new flags will conflict
> >> > with any other VFS standard "dentry::d_flags" or not, and maybe save
> >> > some effort to implement the complex DENTRY lock in your original
> >> > design.
> >> >
> >> Hi Fan Yong,
> >>
> >> Thanks for your great suggestion. I updated the design doc with following
> >> changes, please help to review. Thank you!
> >>
> >> Changelog:
> >> 1. Macro cleanups are removed. I prefer to do it separately while cleaning
> >> up other obsolete macros.
> >> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag
> >> can be hidden from generic VFS layer, it is no longer necessary to introduce
> >> such complex change.
> >> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to
> >> ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.
> >>
> >> Cheers,
> >> Tao
> >> > Cheers,
> >> > Nasf
> >> >
> >> > > Please stay tuned.
> >> > >
> >> > > Best,
> >> > > Tao
> >> > >
> >> > >> Cheers,
> >> > >> --
> >> > >> Nasf
> >> > >>
> >> > >>>
> >> > >>>>> -----Original Message-----
> >> > >>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> > >>>>> Sent: Saturday, January 21, 2012 1:11 AM
> >> > >>>>> To: Peng, Tao; bergwolf at gmail.com
> >> > >>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu,
> >> > >>>>> Xuezhao; laisiyao at whamcloud.com;
> >> > >>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
> >> > >>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>>>>
> >> > >>>>> Hi Tao,
> >> > >>>>>
> >> > >>>>> Excellent work. I just added some comments inside your document.
> >> > >>>>> Please check.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Best Regards,
> >> > >>>>> Fan Yong
> >> > >>>>> Whamcloud, Inc.
> >> > >>>>>
> >> > >>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> >> > >>>>>> Hi Andreas,
> >> > >>>>>>
> >> > >>>>>> Tao is on vacation now. It might be difficult for him to check
> >> > >>>>>> emails due to limited internet
> >> > >> access
> >> > >>>>> at hometown.
> >> > >>>>>> For urgent issue, you folks could send email to his gmail account
> >> > >>>>>> bergwolf at gmail.com.
> >> > >>>>>>
> >> > >>>>>> Thanks,
> >> > >>>>>> Haiying
> >> > >>>>>>
> >> > >>>>>> -----Original Message-----
> >> > >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >> > >>>>>> Sent: 2012年1月20日 1:39
> >> > >>>>>> To: Peng, Tao
> >> > >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> >> > >>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com;
> >> > >>>>>> green at whamcloud.com;
> >> > >>>>>> eeb at whamcloud.com
> >> > >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>>>>>
> >> > >>>>>> On 2012-01-17, at 3:21 AM,<tao.peng at emc.com>   <tao.peng at emc.com>
> >> > >>>>>>   wrote:
> >> > >>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation
> >> > >>>>>>> question on lustre mailing list. I
> >> > >>>>> updated the design to reflect it.
> >> > >>>>>>> Please see attachment.
> >> > >>>>>> Tao,
> >> > >>>>>> Fan Yong is also taking a more detailed look at your document and
> >> > >>>>>> will
> >> > >>>>>> hopefully have a chance to reply before the New Year holidays.
> >> > >>>>>>
> >> > >>>>>> Also, we are just working on landing the patches to add support
> >> > >>>>>> for
> >> > >>>>>> Linux
> >> > >>>>>> 2.6.38 for the Lustre client.  One of the patches relates to the
> >> > >>>>>> lockless dcache changes that were introduced in that kernel.  If
> >> > >>>>>> you
> >> > >>>>>> are interested to review this patch, and become more familiar
> >> > >>>>>> with the
> >> > >>>>>> Lustre development process, you should visit
> >> > >>>>>> http://review.whamcloud.com/1865 for the patch.
> >> > >>>>>>
> >> > >>>>>> You need to create an account in Gerrit using OpenID (Google,
> >> > >>>>>> mostly),
> >> > >>>>>> and an account in our bug tracking system
> >> > >>>>>> (http://jira.whamcloud.com)
> >> > >>>>>> if you haven't already.
> >> > >>>>>>
> >> > >>>>>>>> -----Original Message-----
> >> > >>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >> > >>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
> >> > >>>>>>>> To: Peng, Tao
> >> > >>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
> >> > >>>>>>>> Yong; Oleg Drokin; Eric Barton
> >> > >>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>>>>>>>
> >> > >>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng at emc.com>   wrote:
> >> > >>>>>>>>> I finally started to work on Lustre dcache cleanup and
> >> > >>>>>>>>> locking.
> >> > >>>>>>>>> After reading Lustre, ocfs2 and VFS
> >> > >>>>>>>> dcache related code, I came to a design for cleaning up Lustre
> >> > >>>>>>>> dcache code and doing distributed locking under dcache. For
> >> > >>>>>>>> distributed locking, the main idea is to add a new inode
> >> > >>>>>>>> bitlock
> >> > >>>>>>>> DENTRY lock to just protect valid dentry, instead of letting
> >> > >>>>>>>> LOOKUP
> >> > >>>>>>>> lock handle multiple purpose, which makes client unable to know
> >> > >>>>>>>> whether file is deleted or not when server cancels LOOKUP lock.
> >> > >>>>>>>> Instead, client holds PR mode
> >> > >>>>> DENTRY lock on all valid denties and when server cancels it,
> >> > >>>>> client knows the file is deleted.
> >> > >>>>>>>>> For details, please see the attachments (I attached both word
> >> > >>>>>>>>> and
> >> > >>>>>>>>> pdf versions as I am not sure if
> >> > >>>>>>>> which is more convenient for you:). And please help to review
> >> > >>>>>>>> and comment. Thank you!
> >> > >>>>>>>>
> >> > >>>>>>>> Hi Tao,
> >> > >>>>>>>> I'm passing this on to the engineers that are most familiar
> >> > >>>>>>>> with the
> >> > >>>>>>>> DLM and client dcache code in Lustre.  After a quick first read
> >> > >>>>>>>> through your document, your investigation of the client code is
> >> > >>>>>>>> very
> >> > >>>>>>>> insightful and looks like it will be able to remove some of the
> >> > >>>>>>>> significant complexity that
> >> > has
> >> > >>>>> grown over time in the llite code.
> >> > >>>>>>>> Cheers, Andreas
> >> > >>>>>>>> --
> >> > >>>>>>>> Andreas Dilger                       Whamcloud, Inc.
> >> > >>>>>>>> Principal Engineer                   http://www.whamcloud.com/
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>> <dcache_distributed_locking-v2.docx>
> >> > >>>>>> Cheers, Andreas
> >> > >>>>>> --
> >> > >>>>>> Andreas Dilger                       Whamcloud, Inc.
> >> > >>>>>> Principal Engineer                   http://www.whamcloud.com/
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>> Cheers, Andreas
> >> > >>> --
> >> > >>> Andreas Dilger                       Whamcloud, Inc.
> >> > >>> Principal Engineer                   http://www.whamcloud.com/
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> >
> >>
> >

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dcache_cleanup_v4.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 24430 bytes
Desc: dcache_cleanup_v4.docx
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120222/36c6b598/attachment.bin>


More information about the lustre-devel mailing list