[Lustre-discuss] How To change server recovery timeout
Cliff White
Cliff.White at Sun.COM
Wed Nov 7 09:58:35 PST 2007
Wojciech Turek wrote:
> Hi,
>
> Our lustre environment is:
> 2.6.9-55.0.9.EL_lustre.1.6.3smp
>
> I would like to change recovery timeout from default value 250s to
> something longer
>
> I tried example from manual:
>
> set_timeout <secs> Sets the timeout (obd_timeout) for a server
> to wait before failing recovery.
>
> We performed that experiment on our test lustre installation with one OST.
>
> storage02 is our OSS
>
> [root at storage02 ~]# lctl dl
> 0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5
> 1 UP ost OSS OSS_uuid 3
> 2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
> [root at storage02 ~]# lctl --device 2 set_timeout 600
> set_timeout has been deprecated. Use conf_param instead.
> e.g. conf_param lustre-MDT0000 obd_timeout=50
> usage: conf_param obd_timeout=<secs>
>
> run <command> after connecting to device <devno>
> --device <devno> <command [args ...]>
>
> [root at storage02 ~]# lctl --device 1 conf_param obd_timeout=600
> No device found for name MGS: Invalid argument
> error: conf_param: No such device
>
> It looks like I need to run this command from MGS node so I moved then
> to MGS server called storage03
>
> [root at storage03 ~]# lctl dl
> 0 UP mgs MGS MGS 9
> 1 UP mgc MGC10.143.245.3 at tcp f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5
> 2 UP mdt MDS MDS_uuid 3
> 3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4
> 4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5
> 5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5
> [root at storage03 ~]# lctl device 5
> [root at storage03 ~]# lctl conf_param obd_timeout=600
> error: conf_param: Function not implemented
> [root at storage03 ~]# lctl --device 5 conf_param obd_timeout=600
> error: conf_param: Function not implemented
>
> [root at storage03 ~]# lctl help conf_param
> conf_param: set a permanent config param. This command must be run on
> the MGS node
> usage: conf_param <target.keyword=val> ...
>
> [root at storage03 ~]# lctl conf_param home-md-MDT0000.obd_timeout=600
> error: conf_param: Invalid argument
> [root at storage03 ~]#
>
>
> I searched whole /proc/*/lustre for file that can store this timeout
> value but nothing were found.
>
> Could someone advise how to change value for recovery timeout?
>
> Cheers,
>
> Wojciech Turek
>
It looks like your file system is named 'home' - you can confirm with
tunefs.lustre --print <MDS device> | grep "Lustre FS"
The correct command (Run on the MGS) would be
# lctl conf_param home.sys.timeout=<val>
Example:
[root at ft4 ~]# tunefs.lustre --print /dev/sdb |grep "Lustre FS"
Lustre FS: lustre
[root at ft4 ~]# cat /proc/sys/lustre/timeout
130
[root at ft4 ~]# lctl conf_param lustre.sys.timeout=150
[root at ft4 ~]# cat /proc/sys/lustre/timeout
150
cliffw
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list