[Lustre-discuss] How To change server recovery timeout

Wojciech Turek wjt27 at cam.ac.uk
Wed Nov 7 03:24:12 PST 2007


Hi,

Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp

I would like to change recovery timeout from default value 250s to  
something longer

I tried example from manual:

set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.

We performed that experiment on our test lustre installation with one  
OST.

storage02 is our OSS

[root at storage02 ~]# lctl dl
   0 UP mgc MGC10.143.245.3 at tcp 31259d9b-e655-cdc4-c760-45d3df426d86 5
   1 UP ost OSS OSS_uuid 3
   2 UP obdfilter home-md-OST0001 home-md-OST0001_UUID 7
[root at storage02 ~]# lctl --device 2 set_timeout 600
set_timeout has been deprecated. Use conf_param instead.
e.g. conf_param lustre-MDT0000 obd_timeout=50
usage: conf_param obd_timeout=<secs>

run <command> after connecting to device <devno>
--device <devno> <command [args ...]>

[root at storage02 ~]# lctl --device 1 conf_param obd_timeout=600
No device found for name MGS: Invalid argument
error: conf_param: No such device

It looks like I need to run this command from MGS node so I  moved  
then to MGS server called storage03

[root at storage03 ~]# lctl dl
   0 UP mgs MGS MGS 9
   1 UP mgc MGC10.143.245.3 at tcp f51a910b-a08e-4be6-5ada-b602a5ca9ab3 5
   2 UP mdt MDS MDS_uuid 3
   3 UP lov home-md-mdtlov home-md-mdtlov_UUID 4
   4 UP mds home-md-MDT0000 home-md-MDT0000_UUID 5
   5 UP osc home-md-OST0001-osc home-md-mdtlov_UUID 5
[root at storage03 ~]# lctl device 5
[root at storage03 ~]# lctl conf_param obd_timeout=600
error: conf_param: Function not implemented
[root at storage03 ~]# lctl --device 5 conf_param obd_timeout=600
error: conf_param: Function not implemented

[root at storage03 ~]# lctl help conf_param
conf_param: set a permanent config param. This command must be run on  
the MGS node
usage: conf_param <target.keyword=val> ...

[root at storage03 ~]# lctl conf_param home-md-MDT0000.obd_timeout=600
error: conf_param: Invalid argument
[root at storage03 ~]#


I searched whole /proc/*/lustre for file that can store this timeout  
value but nothing were found.

Could someone advise how to change value for recovery timeout?

Cheers,

Wojciech Turek



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071107/c32a9e70/attachment.htm>


More information about the lustre-discuss mailing list