[Lustre-discuss] How To change server recovery timeout

Cliff White Cliff.White at Sun.COM
Wed Nov 7 09:58:35 PST 2007


Wojciech Turek wrote:
> Hi,
> 
> Our lustre environment is:
> 2.6.9-55.0.9.EL_lustre.1.6.3smp
> 
> I would like to change recovery timeout from default value 250s to 
> something longer
> 
> I tried example from manual:
> 
> set_timeout <secs> Sets the timeout (obd_timeout) for a server
> to wait before failing recovery.
> 
> We performed that experiment on our test lustre installation with one OST.
> 
> storage02 is our OSS
> 
> [root at storage02 ~]# lctl dl
>   0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5
>   1 UP ost OSS OSS_uuid 3
>   2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
> [root at storage02 ~]# lctl --device 2 set_timeout 600
> set_timeout has been deprecated. Use conf_param instead.
> e.g. conf_param lustre-MDT0000 obd_timeout=50
> usage: conf_param obd_timeout=<secs>
> 
> run <command> after connecting to device <devno>
> --device <devno> <command [args ...]>
> 
> [root at storage02 ~]# lctl --device 1 conf_param obd_timeout=600
> No device found for name MGS: Invalid argument
> error: conf_param: No such device
> 
> It looks like I need to run this command from MGS node so I  moved then 
> to MGS server called storage03
> 
> [root at storage03 ~]# lctl dl
>   0 UP mgs MGS MGS 9
>   1 UP mgc MGC10.143.245.3 at tcp f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5
>   2 UP mdt MDS MDS_uuid 3
>   3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4
>   4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5
>   5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5
> [root at storage03 ~]# lctl device 5
> [root at storage03 ~]# lctl conf_param obd_timeout=600
> error: conf_param: Function not implemented
> [root at storage03 ~]# lctl --device 5 conf_param obd_timeout=600
> error: conf_param: Function not implemented
> 
> [root at storage03 ~]# lctl help conf_param
> conf_param: set a permanent config param. This command must be run on 
> the MGS node
> usage: conf_param <target.keyword=val> ...
> 
> [root at storage03 ~]# lctl conf_param home-md-MDT0000.obd_timeout=600
> error: conf_param: Invalid argument
> [root at storage03 ~]#
> 
> 
> I searched whole /proc/*/lustre for file that can store this timeout 
> value but nothing were found.
> 
> Could someone advise how to change value for recovery timeout?
> 
> Cheers,
> 
> Wojciech Turek
> 

It looks like your file system is named 'home' - you can confirm with
tunefs.lustre --print <MDS device> | grep "Lustre FS"

The correct command (Run on the MGS) would be
# lctl conf_param home.sys.timeout=<val>

Example:
[root at ft4 ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS"
Lustre FS:  lustre
[root at ft4 ~]# cat /proc/sys/lustre/timeout
130
[root at ft4 ~]# lctl conf_param lustre.sys.timeout=150
[root at ft4 ~]# cat /proc/sys/lustre/timeout
150

cliffw

> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list