[Lustre-discuss] slow recovery when MDS failed over
Brian J. Murrell
Brian.Murrell at Sun.COM
Thu Aug 7 09:23:18 PDT 2008
On Thu, 2008-08-07 at 12:06 -0400, Brock Palen wrote:
> In doing some testing with our new hardware I did the following:
>
> I rebooted the active MDS server, it failed over to the second one as
> expected. While this was happening a client was reset.
>
> When the MDS came up on the new server by heartbeat it went into
> recovery as expected. The MDS now has been in recovery for 1.5
> hours. I don't think this is normal.
>
> What would cause this? I know by having a client go down (the reset
> above) while the MDS is down but before recovery will cause recovery
> to time out but 1.5 hours is unacceptable time to wait for the file
> system to come back.
>
> This is a stock 1.6.5.1 install.
Hrm. Can you provide the syslog from the backup MDS from the time it
was mounted until present?
b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080807/e57eee9b/attachment.pgp>
More information about the lustre-discuss
mailing list