[Lustre-discuss] Lustre DRBD failover time
Cliff White
Cliff.White at Sun.COM
Tue Jul 14 09:42:54 PDT 2009
tao.a.wu at nokia.com wrote:
>
> Hi, all,
>
> I am evaluating Lustre with DRBD failover, and experiencing about 2
> minutes in OSS failover time to switch to the secondary node. Has
> anyone have the similar observation (so that we can conclude this should
> be expected), or if there is some parameters that I should tune to
> reduce that time?
>
> I have a simple setup: the MDS and OSS0 are hosted on server1, and OSS1
> are hosted on server2. OSS0 and OSS1 are the primary nodes for OST0 and
> OST1, respectively, and the OSTs are replicated using DRBD (protocol C)
> to the other machine. The two OSTs are about 73GB each. I am running
> Lustre 1.6 + DRBD 8 + Heartbeat v2 (but using v1 configuration).
>
> From HA logs, it looks that Heartbeat noticed a node is down within 10
> seconds (with is consistent with the deadtime of 6 seconds). Where does
> the secondary node spend the remaining 100-110 seconds? There was a
> post
> (_http://groups.google.com/group/lustre-discuss-list/msg/bbbeac047df678ca?dmode=source_)
> contributing MDS failover time to fsck. Does it also cause my problem?
as Brian mentioned, Lustre servers go through a recovery process.
You need to examine system logs on the OSS - if Lustre is in recovery,
there will be messages in the logs explaining this.
cliffw
> Thanks,
>
> -Tao
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list