[Lustre-discuss] Understanding OST recovery_duration info

Andreas Dilger andreas.dilger at oracle.com
Thu Aug 12 11:01:53 PDT 2010


On 2010-08-12, at 10:52, Ms. Megan Larko wrote:
> I am looking at the status of my running Lustre 1.6.7.2_3 system
> (upgrade to 1.8.4 within weeks; impetus to further my education).
> 
> The default timeout value for Lustre is 100 sec.   The default
> recovery time is 2x timeout value.   So I believe our site should have
> a recovery of basically 200 sec.  There are a total of 175 OSTs
> mounted on approximately 60 OSSes.  Because of a hard power failure to
> the facility (the power went out AND the battery backup completely
> failed AND the generator was flakey)  the linux 2.6.16.60-0.42.9
> SLES10SP3 system was booted from a no-power state.
> 
> Lustre worked and the file system recovered just fine.  For education,
> the value for "recovery_duration" in /proc/fs/lustre/obdfilter/{ost
> name}/recovery_status file is between 300 and 600.   Does this mean
> that the actual recovery took between 300 and 600 seconds to
> successfully complete?

Right.  That is the actual total recovery time.

> If yes, should the Lustre timeout default value be higher?

No, because even though the base timeout is 100s, there are reasons to
extend the recovery window (e.g. new clients continuing to connect will
indicate to the OSS that there may still be more missing clients having
trouble connecting for some reason).

> Is all of this moot under Lustre 1.8.4 and adaptive timeouts?

Not totally.  There is no fixed recovery window in 1.8, but the basic concepts are largely the same.  There will be an adaptive timeout during operation (between 5-900s by default) that will be used as the base number when recovery starts.

In my simple 3-client 1-server (MDT + 5 OST) home system, I had completed Lustre recovery after hard server reboot in 27s (not counting server restart time), so AT can definitely make a difference. 

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list