[Lustre-devel] Agent/Coordinator RPC mechanisms.

Nathaniel Rutman Nathan.Rutman at Sun.COM
Mon Nov 3 12:20:54 PST 2008

Aurelien Degremont wrote:
> Agent/coordinator mechanisms to discuss at next conf call.
> If you have strong disagreement, do not hesitate to send them now so i
> can modify them before next conf call.
> A - Coordinator/Agent start
> ---
> 1 - MDT starts (Coordinator features are available by default as the
> coordinator reuse MDT threads)
> 2 - Client start with a agent flag (mount -o agent)
> 3 - Client connects to MDT (piggyback the coordinator registration on
> the MDT connection RPC (with a flag?) ?)
yes, I think so, just use a connect flag
> 4 - If no direct registration, Client send a registration request to the
> coordinator through MDT connection after it was initiated.
don't see a need, unless there's some agent data we want to report at 
> 5 - Agent is ready.
> B - Request dispatch
> ---
> 1 - Coordinator receives a request. It writes in its llog file the
> migration request.
> 2 - Coordinator sends a migration request to one of its registered agents.
On the client's reverse import, presumably.  So we need to add a service 
agent startup, probably mdc startup.   No agents on a liblustre client.
> 3 - The agent manages the requests.
> 4 - The agent sends periodically some migration status update to
> coordinator.
We were talking about the copytool sending updates via file ioctls
> 5 - When coordinator receives status finished, it cleans its llog entry
> for this migration.
This works for copyin/copyout, but not unlink, since there's no file for
an agent to do an update ioctl on.
> C - MDT crash
> ---
> 1 - MDT crashes.
> 2 - MDT is restarted.
> 3 - The coordinator recreates its migration list, reading the its llog.
> 4 - The client, when doing its recovery with the MDT, reconnects to the
> coordinator. It also sends the current status of its migrations.
Status is sent by copytools periodically, asynchronously from reconnect.
As far as the copytools/agent is concerned, the MDT restart is invisible.
> 5 - Thanks to this, the coordinator has rebuilt its migration list and
> agent list.
> (as this is standard mdt recovery, this supports failover also)
The agent list is rebuild at reconnect time.  The migration list is simply
the list of unfinished migrations; it reads that from the llog whenever 
it wants to
(no need to keep it in memory all the time) and decides to restart
stuck/broken migrations as usual.  (E.g. it could read the log once 
every minute
checking for last_status_update_time's older than X.)  I don't see any 
reason it needs
to be in memory all the time.
So logs should contain fid, request type, agent_id (for aborts), 
last_status_update_time, last_status.
> E - Client crash
> ---
> 1 - Client crashes
> 2 - MDT notices the client node did not respond anymore. The node is
> evicted, its migrations are dispatched on another nodes. Node eviction
> (oss are supposed to evict it also) prevent the movers from this node to
> go on their migration. We could restart it on another agent without 
> issue.
2. MDT evicts client
3. Eviction triggers coordinator to re-dispatch immediately all of the 
migrations from that agent
4. For copyin, MDT must force any existing agent I/O to stop.  Hmm, but 
agents are ignoring
the layout lock - how are we going to do this?  Maybe it's not so bad if 
two agents are trying to
copyin the file at the same time?  File data is the same...

F - Copytool crash
Copytool crash is different from a client crash, since the client will 
not get evicted
1. Copytool crashes
2. Coordinator periodically scans the list of open migrations for old 
3. Coordinator sends abort signal to old agent
4. Coordinator re-dispatches migration

