[Lustre-discuss] Regarding redundancy

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Apr 7 05:14:55 PDT 2009


On Tue, 2009-04-07 at 09:20 +0200, Arne Wiebalck wrote:
> Brian,

Arne,

> what about if you have multiple clients, all having transactions with
> the OSS open. Now the OSS goes down and comes back. From what I
> understand, the server goes into recovery and rejects new connections 
> before recovery is finished (correct?).

Correct.

> What if all but one client
> reconnect, i.e. you lose one client: are the transactions of the
> successfully reconnected clients replayed or are they discarded?

If the lost client has a transaction that needs to be replayed, all of
the transactions up to that missing transaction are replayed but all
subsequent transactions are discarded and when the recovery timer
expires, recovery is aborted.

The semantics of this will change when VBR becomes available, in
1.8.something, where something might be 0 even.  In that case, only
transactions actually dependent on the missing transactions will be
discarded.

> Independent from the load? I think the 'official' statement was that the
> cluster has to be quiescent, i.e. no client activity. Is that (still)
> true?

Yes, that is the official statement and I don't think any further
testing has been done to change that statement, officially, but I think
the general feeling is that quiescence should not be necessary, but we
just don't have the scientific testing to be assured of that.

So if you want to be safe, quiesce the filesystem first.  :-)

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090407/6dac8e93/attachment.pgp>


More information about the lustre-discuss mailing list