[Lustre-discuss] slow recovery when MDS failed over

Andreas Dilger adilger at sun.com
Mon Aug 18 20:02:36 PDT 2008


On Aug 07, 2008  12:06 -0400, Brock Palen wrote:
> When the MDS came up on the new server by heartbeat it went into  
> recovery as expected.  The MDS now has been in recovery for 1.5  
> hours.  I don't think this is normal.
> 
> What would cause this?  I know by having a client go down (the reset  
> above) while the MDS is down but before recovery will cause recovery  
> to time out but 1.5 hours is unacceptable time to wait for the file  
> system to come back.

The recovery should time out in about 5 minutes if the clients do not
reply.  Something is definitely wrong.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list