[Lustre-devel] Sub Tree lock ideas.
Andreas Dilger
adilger at sun.com
Mon Jan 26 02:08:56 PST 2009
On Jan 21, 2009 15:49 -0500, Oleg Drokin wrote:
> So, I think it is a given we do not want to revoke a subtree lock
> every time somebody steps through it, because that will be too costly
> in a lot of cases.
A few comments that I have from the later discussions:
- you previously mentioned that only a single client would be able to
hold a subtree lock. I think it is critical that multiple clients be
able to get read subtree locks on the same directory. This would be
very important for uses like many clients and a shared read-mostly
directory like /usr/bin or /usr/lib.
- Alex (I think) suggested that the STL locks would only be on a single
directory and its contents, instead of being on an arbitrary depth
sub-tree. While it seems somewhat appealing to have a single lock
that covers an entire subtree, the complexity of having to locate
and manage arbitrary-depth locks on the MDS might be too high.
In most use cases it is pretty rare to have very deep subtrees, and
the common case will be a large number of files in a single directory
and a subtree lock will serve this use case equally well.
Having only a single-level of subtree lock would avoid the need to
pass cookies to the MDS for anything other than the directory in
which names are being looked up.
> Anyway here is what I have in mind.
>
> STL locks could be granted by server regardless if they were
> requested by the client or not.
>
> We would require clients to provide a lock "cookie" with every
> operation they perform, in normal case that would be a handle they
> have on a parent directory.
> This cookie should allow a way to find out what server this cookie
> originates from (needed for CMD support).
>
> For the case of a different client stepping into area covered by
> STL lock, this client would get STL lock's cookie and will start
> present it for all subsequent
> operations (also a special flag meaning that the client is not
> operating within STL).
> When the server receives a request with a cookie that is found out
> to be for STL lock, a callback is made to that lock (if necessary -
> through other server in CMD case)
> and information about currently-accessed fid and access mode is
> included, the client where the callback ends up on will do necessary
> writeout of the object content (flush dirty data
> for the case of a file, flush any metadata changes in case of a
> directory (needed for metadata writeback cache. Would be a server-noop
> for r/o access to directories before
> WBC is implemented) and aside from that if the operation is
> modifying, the STL-holding client would have to release the STL lock
> and would have a choice of completely
> flushing its cache for the subtree protected by the STL or
> obtaining STLs for parts of the tree below STL and retain its cache
> for those subtrees.
> Additionally for r/o access the STL-holding client would have
> extra choices of doing nothing (besides cache writeout flush for the
> object content) or allowing a server to
> issue a lock on that fid, in which case the client would flush its
> own cache for entire subtree starting with that fid first.
> If the lock cookie presented by the accessing client is determined
> to be invalid (rogue client, or lock was already released), a reverse
> lookup is performed up the tree
> (possibly crossing MDT boundaries) by the server in search of an
> already granted (to a client) lock or the root of the tree, whatever
> is met first. If during this
> lookup a lock is met, and it happens to be STL lock, its cookie is
> returned to the client along with indication of the STL lock presence,
> otherwise normal
> operations with normal lock granting occur.
>
> When a client gets STL lock for itself, it also performs all
> subsequent operations by presenting the STL lock handle. It might get
> a reply from a server indicating that
> the entry being accessed is "shared" (determined by server as an
> opened file or inode on which there are any locks granted to any
> clients) and a normal lock (or in case this
> area of the tree is covered by somebody else's STL - that STL's
> cookie) if needed. All metadata cached on behalf of STL lock is marked
> as such in the client's cache.
>
> This approach allows for dynamically growing STL tree with ability
> to cut it at any level (by a presence of a lock in some part of the
> tree). Originally after issued, STL
> lock would span from the root of the subtree it was issued on to
> any points where other clients might have any cached information (or
> if no other clients hold locks there -
> for entire subtree), and then there is a possibility to cut some
> of the subsubtrees from the subtree protected by STL. This also allows
> for nested STLs held by different
> clients.
> One important thing that needs to be done in this scenario is we
> must ensure any process with CWD on lustre would have a lock on that
> directory if possible (of course we
> cannot refuse this lock revocation if other clients want to modify
> directory content). This would allow us to avoid costly reverse
> lookups to find if we are under any STL
> lock when we operate from a CWD on lustre (STL lock would just be
> cut at the CWD point with the normal lock).
>
> We would need to implement cross-MDT lock callbacks.
>
> I think it is safe to depend on clients to provide locks since if
> they don't or provide invalid ones - we can find this out (and we can
> couple locks with
> some secure tokens if needed too), the only downside is rogue
> clients would be able to slow down servers to do all the reverse
> lookups (though if we just
> refuse to speak with clients that present invalid locks that were
> never present in the system on a non-root of the FS inode, that should
> be somewhat mitigated).
> The other alternative is to mark every server dentry with a STL
> marker during traversal, but in that case recovery in case of server
> restart becomes somewhat
> problematic, so I do not think this is a good idea.
>
>
> Bye,
> Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-devel
mailing list