[Lustre-discuss] recovery status errors

Carlos Santana neubyr at gmail.com
Mon Jun 22 08:01:04 PDT 2009


Hello,

The lustre server is giving following errors related to recovery mode.
What could be the cause and solution for this? I remember rebooting my
server without unmounting OSS and MDS nodes though.

Logs:

Jun 20 01:53:14 localhost kernel: Lustre:
5771:0:(mds_fs.c:674:mds_init_server_data()) RECOVERY: service
lustre-MDT0000, 1 recoverable clients, 0 delayed clients, last_transno
34359738368
Jun 20 01:53:14 localhost kernel: Lustre: MDT lustre-MDT0000 now
serving lustre-MDT0000_UUID
(lustre-MDT0000/e6123a33-d80d-bf1b-490b-49893680fa58), but will be in
recovery for at least 5:00, or until 1 client reconnect. During this
time new clients will not be allowed to connect. Recovery progress can
be monitored by watching
/proc/fs/lustre/mds/lustre-MDT0000/recovery_status.
Jun 20 01:53:14 localhost kernel: Lustre:
5771:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) lustre-MDT0000:
group upcall set to /usr/sbin/l_getgroups
Jun 20 01:53:14 localhost kernel: Lustre: lustre-MDT0000.mdt: set
parameter group_upcall=/usr/sbin/l_getgroups
Jun 20 01:53:14 localhost kernel: Lustre: Server lustre-MDT0000 on
device /dev/loop5 has started
Jun 20 01:53:19 localhost kernel: Lustre: Request
x18446744071689995159 sent from lustre-OST0000-osc to NID 0 at lo 5s ago
has timed out (limit 5s).
Jun 20 01:53:28 localhost kernel: Lustre: lustre-MDT0000: temporarily
refusing client connection from 10.0.0.24 at tcp
Jun 20 01:53:28 localhost kernel: LustreError:
5764:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error
(-11)  req at c8bc5400 x-1300233114/t0 o38-><?>@<?>:0/0 lens 368/0 e 0 to
0 dl 1245480908 ref 1 fl Interpret:/0/0 rc -11/0

-----------------------------

Lustre: 2057:0:(filter.c:999:filter_init_server_data()) RECOVERY:
service lustre-OST0000, 1 recoverable clients, 0 delayed clients,
last_rcvd 47244640256
Lustre: OST lustre-OST0000 now serving dev
(lustre-OST0000/072355c9-f254-9af8-4c05-bce872c287bf), but will be in
recovery for at least 5:00, or until 1 client reconnect. During this
time new clients will not be allowed to connect. Recovery progress can
be monitored by watching
/proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
Lustre: Server lustre-OST0000 on device /dev/loop1 has started
Lustre: 1971:0:(import.c:508:import_select_connection())
lustre-OST0000-osc: tried all connections, increasing latency to 5s
Lustre: 2048:0:(ldlm_lib.c:1333:check_and_start_recovery_timer())
lustre-OST0000: starting recovery timer
LustreError: 2048:0:(ldlm_lib.c:884:target_handle_connect())
lustre-OST0000: denying connection for new client 0 at lo
(lustre-mdtlov_UUID): 1 clients in recovery for 300s
LustreError: 2048:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@
processing error (-16)  req at ce5c5e00 x464519181/t0 o8-><?>@<?>:0/0
lens 368/264 e 0 to 0 dl 1245622846 ref 1 fl Interpret:/0/0 rc -16/0
LustreError: 11-0: an error occurred while communicating with 0 at lo.
The ost_connect operation failed with -16

-----------------------------

Lustre: 2057:0:(filter.c:999:filter_init_server_data()) RECOVERY:
service lustre-OST0000, 1 recoverable clients, 0 delayed clients,
last_rcvd 47244640256
Lustre: OST lustre-OST0000 now serving dev
(lustre-OST0000/072355c9-f254-9af8-4c05-bce872c287bf), but will be in
recovery for at least 5:00, or until 1 client reconnect. During this
time new clients will not be allowed to connect. Recovery progress can
be monitored by watching
/proc/fs/lustre/obdfilter/lustre-OST0000/recovery_status.
Lustre: Server lustre-OST0000 on device /dev/loop1 has started
Lustre: 1971:0:(import.c:508:import_select_connection())
lustre-OST0000-osc: tried all connections, increasing latency to 5s
Lustre: 2048:0:(ldlm_lib.c:1333:check_and_start_recovery_timer())
lustre-OST0000: starting recovery timer
LustreError: 2048:0:(ldlm_lib.c:884:target_handle_connect())
lustre-OST0000: denying connection for new client 0 at lo
(lustre-mdtlov_UUID): 1 clients in recovery for 300s
LustreError: 2048:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@
processing error (-16)  req at ce5c5e00 x464519181/t0 o8-><?>@<?>:0/0
lens 368/264 e 0 to 0 dl 1245622846 ref 1 fl Interpret:/0/0 rc -16/0
LustreError: 11-0: an error occurred while communicating with 0 at lo.
The ost_connect operation failed with -16
Lustre: 1971:0:(import.c:508:import_select_connection())
lustre-OST0000-osc: tried all connections, increasing latency to 10s
LustreError: 2049:0:(ldlm_lib.c:884:target_handle_connect())
lustre-OST0000: denying connection for new client 0 at lo
(lustre-mdtlov_UUID): 1 clients in recovery for 250s
LustreError: 2049:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@
processing error (-16)  req at c9a0d600 x464519184/t0 o8-><?>@<?>:0/0
lens 368/264 e 0 to 0 dl 1245622896 ref 1 fl Interpret:/0/0 rc -16/0

-----------------------------


Following are the log messages in detail:
http://www.heypasteit.com/clip/92X
http://www.heypasteit.com/clip/92Y
http://www.heypasteit.com/clip/92Z

Any clues?

Thanks,
CS.



More information about the lustre-discuss mailing list