[Lustre-discuss] Suspend/resume of Linux jobs

Brian J. Murrell Brian.Murrell at Sun.COM
Fri Mar 6 20:38:15 PST 2009


On Fri, 2009-03-06 at 16:56 -0800, Peter Bojanic wrote:
> Hi,

Hi Peter,

From what I understand about locking (and won't at all be surprised to
get at least partially corrected here, but I will give it a shot
anyway)...

> if a user suspends a job in Linux (^Z) is the  
> filesystem at all affected wrt revoking locks, etc. or does that  
> happen reliably and unaffected in the VFS layer?

Lustre's locking happens reliably and transparently to the application
so whether it's suspended or just just "sleep"ing for a long time,
Lustre should continue to operate as it normally would.

If an application were to open a file for write and there were no
(lustre) locks on that file already, the lustre client should be
successful in obtaining a lock (i.e. permission) to write to that file
in it's entirety -- IIRC, the lustre client tries to be as aggressive as
it can in getting the biggest locks it can.

So, the lustre client now holds that lock and the user suspends the
application while it still has the file open for write (just to be
absolutely clear on what the state of the lock should be).  That suspend
is just a user-space operation.  Nothing in the kernel is likewise
suspended.  So let's say while the application is suspended, some other
client wants to write that file.  The lock will be revoked on the client
with the suspended application and because the kernel resources held on
behalf of that client are not also suspended, the lock is given up by
the client and the other client is allowed to proceed with his lock and
write.

Now if the application that was suspended is resumed, before it would be
able to proceed with a write on that open file descriptor, lustre would
have to reclaim that lock and the application's I/O would wait for that
to happen -- the application doesn't see any of this though.  It just
does it's write() and lustre takes it from there and does the required
locking prior to actually writing.

b.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090306/fdf4f57f/attachment.pgp>


More information about the lustre-discuss mailing list