[Lustre-devel] Sub Tree lock ideas.

Wed Jan 21 12:49:49 PST 2009

Hello!

    We discussed a bit of this in Beijing last week, but decided to  
continue the discussion via email.

    So, I think it is a given we do not want to revoke a subtree lock  
every time somebody steps through it, because that will be too costly  
in a lot of cases.

    Anyway here is what I have in mind.

    STL locks could be granted by server regardless if they were  
requested by the client or not.

    We would require clients to provide a lock "cookie" with every  
operation they perform, in normal case that would be a handle they  
have on a parent directory.
    This cookie should allow a way to find out what server this cookie  
originates from (needed for CMD support).

    For the case of a different client stepping into area covered by  
STL lock, this client would get STL lock's cookie and will start  
present it for all subsequent
    operations (also a special flag meaning that the client is not  
operating within STL).
    When the server receives a request with a cookie that is found out  
to be for STL lock, a callback is made to that lock (if necessary -  
through other server in CMD case)
    and information about currently-accessed fid and access mode is  
included, the client where the callback ends up on will do necessary  
writeout of the object content (flush dirty data
    for the case of a file, flush any metadata changes in case of a  
directory (needed for metadata writeback cache. Would be a server-noop  
for r/o access to directories before
    WBC is implemented) and aside from that if the operation is  
modifying, the STL-holding client would have to release the STL lock  
and would have a choice of completely
    flushing its cache for the subtree protected by the STL or  
obtaining STLs for parts of the tree below STL and retain its cache  
for those subtrees.
    Additionally for r/o access the STL-holding client would have  
extra choices of doing nothing (besides cache writeout flush for the  
object content) or allowing a server to
    issue a lock on that fid, in which case the client would flush its  
own cache for entire subtree starting with that fid first.
    If the lock cookie presented by the accessing client is determined  
to be invalid (rogue client, or lock was already released), a reverse  
lookup is performed up the tree
    (possibly crossing MDT boundaries) by the server in search of an  
already granted (to a client) lock or the root of the tree, whatever  
is met first. If during this
    lookup a lock is met, and it happens to be STL lock, its cookie is  
returned to the client along with indication of the STL lock presence,  
otherwise normal
    operations with normal lock granting occur.

    When a client gets STL lock for itself, it also performs all  
subsequent operations by presenting the STL lock handle. It might get  
a reply from a server indicating that
    the entry being accessed is "shared" (determined by server as an  
opened file or inode on which there are any locks granted to any  
clients) and a normal lock (or in case this
    area of the tree is covered by somebody else's STL - that STL's  
cookie) if needed. All metadata cached on behalf of STL lock is marked  
as such in the client's cache.

    This approach allows for dynamically growing STL tree with ability  
to cut it at any level (by a presence of a lock in some part of the  
tree). Originally after issued, STL
    lock would span from the root of the subtree it was issued on to  
any points where other clients might have any cached information (or  
if no other clients hold locks there -
    for entire subtree), and then there is a possibility to cut some  
of the subsubtrees from the subtree protected by STL. This also allows  
for nested STLs held by different
    clients.
    One important thing that needs to be done in this scenario is we  
must ensure any process with CWD on lustre would have a lock on that  
directory if possible (of course we
    cannot refuse this lock revocation if other clients want to modify  
directory content). This would allow us to avoid costly reverse  
lookups to find if we are under any STL
    lock when we operate from a CWD on lustre (STL lock would just be  
cut at the CWD point with the normal lock).

    We would need to implement cross-MDT lock callbacks.

    I think it is safe to depend on clients to provide locks since if  
they don't or provide invalid ones - we can find this out (and we can  
couple locks with
    some secure tokens if needed too), the only downside is rogue  
clients would be able to slow down servers to do all the reverse  
lookups (though if we just
    refuse to speak with clients that present invalid locks that were  
never present in the system on a non-root of the FS inode, that should  
be somewhat mitigated).
    The other alternative is to mark every server dentry with a STL  
marker during traversal, but in that case recovery in case of server  
restart becomes somewhat
    problematic, so I do not think this is a good idea.

Bye,
     Oleg