[Lustre-discuss] Understanding OST recovery_duration info

Ms. Megan Larko dobsonunit at gmail.com
Thu Aug 12 09:52:10 PDT 2010


Hello,

I am looking at the status of my running Lustre 1.6.7.2_3 system
(upgrade to 1.8.4 within weeks; impetus to further my education).

The default timeout value for Lustre is 100 sec.   The default
recovery time is 2x timeout value.   So I believe our site should have
a recovery of basically 200 sec.  There are a total of 175 OSTs
mounted on approximately 60 OSSes.  Because of a hard power failure to
the facility (the power went out AND the battery backup completely
failed AND the generator was flakey)  the linux 2.6.16.60-0.42.9
SLES10SP3 system was booted from a no-power state.

Lustre worked and the file system recovered just fine.  For education,
the value for "recovery_duration" in /proc/fs/lustre/obdfilter/{ost
name}/recovery_status file is between 300 and 600.   Does this mean
that the actual recovery took between 300 and 600 seconds to
successfully complete?    If yes, should the Lustre timeout default
value be higher?    Is all of this moot under Lustre 1.8.4 and
adaptive timeouts?

I appreciate the time taken to enlighten me.   Smile!

Cheers!
M Larko



More information about the lustre-discuss mailing list