[Lustre-devel] Sub Tree lock ideas.
Oleg.Drokin at Sun.COM
Wed Jan 21 12:49:49 PST 2009
We discussed a bit of this in Beijing last week, but decided to
continue the discussion via email.
So, I think it is a given we do not want to revoke a subtree lock
every time somebody steps through it, because that will be too costly
in a lot of cases.
Anyway here is what I have in mind.
STL locks could be granted by server regardless if they were
requested by the client or not.
We would require clients to provide a lock "cookie" with every
operation they perform, in normal case that would be a handle they
have on a parent directory.
This cookie should allow a way to find out what server this cookie
originates from (needed for CMD support).
For the case of a different client stepping into area covered by
STL lock, this client would get STL lock's cookie and will start
present it for all subsequent
operations (also a special flag meaning that the client is not
operating within STL).
When the server receives a request with a cookie that is found out
to be for STL lock, a callback is made to that lock (if necessary -
through other server in CMD case)
and information about currently-accessed fid and access mode is
included, the client where the callback ends up on will do necessary
writeout of the object content (flush dirty data
for the case of a file, flush any metadata changes in case of a
directory (needed for metadata writeback cache. Would be a server-noop
for r/o access to directories before
WBC is implemented) and aside from that if the operation is
modifying, the STL-holding client would have to release the STL lock
and would have a choice of completely
flushing its cache for the subtree protected by the STL or
obtaining STLs for parts of the tree below STL and retain its cache
for those subtrees.
Additionally for r/o access the STL-holding client would have
extra choices of doing nothing (besides cache writeout flush for the
object content) or allowing a server to
issue a lock on that fid, in which case the client would flush its
own cache for entire subtree starting with that fid first.
If the lock cookie presented by the accessing client is determined
to be invalid (rogue client, or lock was already released), a reverse
lookup is performed up the tree
(possibly crossing MDT boundaries) by the server in search of an
already granted (to a client) lock or the root of the tree, whatever
is met first. If during this
lookup a lock is met, and it happens to be STL lock, its cookie is
returned to the client along with indication of the STL lock presence,
operations with normal lock granting occur.
When a client gets STL lock for itself, it also performs all
subsequent operations by presenting the STL lock handle. It might get
a reply from a server indicating that
the entry being accessed is "shared" (determined by server as an
opened file or inode on which there are any locks granted to any
clients) and a normal lock (or in case this
area of the tree is covered by somebody else's STL - that STL's
cookie) if needed. All metadata cached on behalf of STL lock is marked
as such in the client's cache.
This approach allows for dynamically growing STL tree with ability
to cut it at any level (by a presence of a lock in some part of the
tree). Originally after issued, STL
lock would span from the root of the subtree it was issued on to
any points where other clients might have any cached information (or
if no other clients hold locks there -
for entire subtree), and then there is a possibility to cut some
of the subsubtrees from the subtree protected by STL. This also allows
for nested STLs held by different
One important thing that needs to be done in this scenario is we
must ensure any process with CWD on lustre would have a lock on that
directory if possible (of course we
cannot refuse this lock revocation if other clients want to modify
directory content). This would allow us to avoid costly reverse
lookups to find if we are under any STL
lock when we operate from a CWD on lustre (STL lock would just be
cut at the CWD point with the normal lock).
We would need to implement cross-MDT lock callbacks.
I think it is safe to depend on clients to provide locks since if
they don't or provide invalid ones - we can find this out (and we can
couple locks with
some secure tokens if needed too), the only downside is rogue
clients would be able to slow down servers to do all the reverse
lookups (though if we just
refuse to speak with clients that present invalid locks that were
never present in the system on a non-root of the FS inode, that should
be somewhat mitigated).
The other alternative is to mark every server dentry with a STL
marker during traversal, but in that case recovery in case of server
restart becomes somewhat
problematic, so I do not think this is a good idea.
More information about the lustre-devel