[Lustre-devel] SOM safety

dzogin Dmitri.Zoguine at Sun.COM
Wed Jan 6 10:10:40 PST 2010

Nicolas Williams wrote:
> The health network will allow for eviction notices to be spread around
> the cluster quickly.
> I think we'll need a separate cluster membership capability for reasons
> having to do with optimizing the health network: if you see a peer C
> that's got a membership capability issued at time T_a and you're a
> server S_n that's been in the cluster since before T_a and you've not
> heard any eviction notices for C, then C is still a member of the
> cluster.  Without a cluster membership capability we'd need to ask the
> health network if C is a member, and while that can happen quickly, in a
> mostly-stateless health network (the current design) having every server
> ask about the membership/liveness status of every peer client could
> result in a load spike.
I think this can be easily implemented as a bitmap on every server (both 
OSS and MDS) that keeps track of the alive and evicted clients. Once you 
have processed all the amount of work that needs to be done for the 
client that has been evicted, that bit in the evicted clients map is 
cleared. Instead of sending the information about every client, servers 
can exchange the bitmaps, and doing XOR on bitmaps would allow you to 
easily see the discrepancies in client evictions.

I think the same idea of bitmaps can be implemented for objects, if we 
want to track which client is updating the object/stripe.


More information about the lustre-devel mailing list