[Lustre-discuss] Strange behavior of Lustre 1.6.7.2

Charland, Denis Denis.Charland at imi.cnrc-nrc.gc.ca
Tue Oct 5 12:46:04 PDT 2010


I have a cluster front node that is configured as a Lustre server. The mgs, mdt and ost all reside on this system.
The mgs and the mdt are located on a RAID-1 disk on the first RAID controller. The ost is located on a RAID-5 (six 1.0 TB disks)
on the second RAID controller. The system is a dual Intel quad-core Nehalem processor with 6 GB of RAM. Hyper-threading is on.

The server is running F7 with patched kernel 2.6.22.14-72 and Lustre 1.6.7.2. The server is also a client of itself
(mounted on /home1) in addition to the 64 compute nodes and a few workstations.

When I load the server with the command "find /home1 -inum 100", the following error messages are written to the
server system logfile (/var/log/messages):

Oct  5 11:08:37 fn1 kernel: Lustre: Request x260093 sent from fn1home1-OST0000-osc to NID 0 at lo 50s a
go has timed out (limit 50s).
Oct  5 11:08:37 fn1 kernel: Lustre: fn1home1-OST0000-osc: Connection to service fn1home1-OST0000 via
 nid 0 at lo was lost; in progress operations using this service will wait for recovery to complete.
Oct  5 11:08:37 fn1 kernel: Lustre: 3580:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: fn1home1-mdtlov_UUID reconnecting
Oct  5 11:08:37 fn1 kernel: Lustre: 3580:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from fn1home1-mdtlov_UUID at 0@lo to 0xffff8101e9cba000; still busy with 2 active
 RPCs
Oct  5 11:08:37 fn1 kernel: LustreError: 3580:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff81015e86a800 x260114/t0 o8->fn1home1-mdtlov_UUID at 172.17.15.20@tcp:0/0 lens
 304/200 e 0 to 0 dl 1286291417 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:08:37 fn1 kernel: LustreError: 3580:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 19
 previous similar messages
Oct  5 11:08:37 fn1 kernel: LustreError: 11-0: an error occurred while communicating with 0 at lo. The
ost_connect operation failed with -16
Oct  5 11:09:02 fn1 kernel: Lustre: 3152:0:(import.c:507:import_select_connection()) fn1home1-OST000
0-osc: tried all connections, increasing latency to 6s
Oct  5 11:09:02 fn1 kernel: Lustre: 3521:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: fn1home1-mdtlov_UUID reconnecting
Oct  5 11:09:02 fn1 kernel: Lustre: 3521:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from fn1home1-mdtlov_UUID at 0@lo to 0xffff8101e9cba000; still busy with 2 active
 RPCs
Oct  5 11:09:02 fn1 kernel: LustreError: 3521:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff810112f3f850 x260116/t0 o8->fn1home1-mdtlov_UUID at 172.17.15.20@tcp:0/0 lens
 304/200 e 0 to 0 dl 1286291442 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:09:02 fn1 kernel: LustreError: 11-0: an error occurred while communicating with 0 at lo. The
ost_connect operation failed with -16
Oct  5 11:09:10 fn1 kernel: Lustre: Request x260076 sent from fn1home1-OST0000-osc-ffff8101ee411000
to NID 0 at lo 100s ago has timed out (limit 100s).
Oct  5 11:09:10 fn1 kernel: Lustre: fn1home1-OST0000-osc-ffff8101ee411000: Connection to service fn1
home1-OST0000 via nid 0 at lo was lost; in progress operations using this service will wait for recover
y to complete.
Oct  5 11:09:10 fn1 kernel: Lustre: 3591:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: 12f4dd8a-1056-861c-2a01-70095593af9f reconnecting
Oct  5 11:09:10 fn1 kernel: Lustre: 3591:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from 12f4dd8a-1056-861c-2a01-70095593af9f at 0@lo to 0xffff8101119f6000; still bu
sy with 3 active RPCs
Oct  5 11:09:10 fn1 kernel: LustreError: 11-0: an error occurred while communicating with 0 at lo. The
ost_connect operation failed with -16
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
136 previous similar messages
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 11:09:10 fn1 kernel: LustreError: 3542:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 136 previous similar messages
Oct  5 11:09:10 fn1 kernel: LustreError: 3355:0:(filter.c:1229:filter_parent_lock()) fn1home1-OST000
0: slow parent lock 100s
Oct  5 11:09:10 fn1 kernel: LustreError: 3596:0:(filter.c:1229:filter_parent_lock()) fn1home1-OST000
0: slow parent lock 64s
Oct  5 11:09:10 fn1 kernel: Lustre: 6833:0:(service.c:1317:ptlrpc_server_handle_request()) @@@ Reque
st x260093 took longer than estimated (50+33s); client may timeout.  req at ffff810158400400 x260093/t0
 o5->fn1home1-mdtlov_UUID at 172.17.15.20@tcp:0/0 lens 336/336 e 0 to 0 dl 1286291317 ref 1 fl Complete
:/0/0 rc 0/0
Oct  5 11:09:10 fn1 kernel: LustreError: 3596:0:(filter.c:1229:filter_parent_lock()) Skipped 1 previ
ous similar message
Oct  5 11:09:35 fn1 kernel: Lustre: 3152:0:(import.c:507:import_select_connection()) fn1home1-OST000
0-osc: tried all connections, increasing latency to 11s
Oct  5 11:09:35 fn1 kernel: Lustre: 3524:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: 12f4dd8a-1056-861c-2a01-70095593af9f reconnecting
Oct  5 11:09:35 fn1 kernel: Lustre: fn1home1-OST0000-osc: Connection restored to service fn1home1-OS
T0000 using nid 0 at lo.
Oct  5 11:09:35 fn1 kernel: Lustre: 3524:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 prev
ious similar message
Oct  5 11:09:35 fn1 kernel: Lustre: fn1home1-OST0000: received MDS connection from 0 at lo
Oct  5 11:09:35 fn1 kernel: Lustre: MDS fn1home1-MDT0000: fn1home1-OST0000_UUID now active, resettin
g orphans
Oct  5 11:09:35 fn1 kernel: Lustre: 3619:0:(filter.c:3077:filter_precreate()) fn1home1-OST0000: prec
reate aborted by destroy
Oct  5 11:09:35 fn1 kernel: Lustre: 3566:0:(filter.c:2849:filter_destroy_precreated()) fn1home1-OST0
000: deleting orphan objects from 3096267 to 3096299
.
.
.
Oct  5 11:43:10 fn1 kernel: Lustre: 3513:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:10 fn1 kernel: Lustre: 3513:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:10 fn1 kernel: LustreError: 3513:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff8101e919dc00 x1348683852632032/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293490 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:43:10 fn1 kernel: LustreError: 3513:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 1
previous similar message
Oct  5 11:43:17 fn1 kernel: Lustre: 3564:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:17 fn1 kernel: Lustre: 3564:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:17 fn1 kernel: LustreError: 3564:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff8101e97a9400 x1348683852632033/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293497 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:43:24 fn1 kernel: Lustre: 3539:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:24 fn1 kernel: Lustre: 3539:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:24 fn1 kernel: LustreError: 3539:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff8101dd1d1400 x1348683852632034/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293504 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:43:31 fn1 kernel: Lustre: 3563:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:31 fn1 kernel: Lustre: 3563:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:31 fn1 kernel: LustreError: 3563:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff810172c25000 x1348683852632035/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293511 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:43:38 fn1 kernel: Lustre: 3557:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:38 fn1 kernel: Lustre: 3557:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:45 fn1 kernel: Lustre: 3498:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:45 fn1 kernel: Lustre: 3498:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:45 fn1 kernel: LustreError: 3498:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff8101dd1d1c00 x1348683852632039/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293525 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:43:45 fn1 kernel: LustreError: 3498:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 1
previous similar message
Oct  5 11:43:59 fn1 kernel: Lustre: 3560:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:43:59 fn1 kernel: Lustre: 3560:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 1 prev
ious similar message
Oct  5 11:43:59 fn1 kernel: Lustre: 3560:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:43:59 fn1 kernel: Lustre: 3560:0:(ldlm_lib.c:780:target_handle_connect()) Skipped 1 previo
us similar message
Oct  5 11:44:06 fn1 kernel: LustreError: 3563:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff810137723c00 x1348683852632044/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293546 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:44:06 fn1 kernel: LustreError: 3563:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 2
previous similar messages
Oct  5 11:44:20 fn1 kernel: Lustre: 3553:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: a80ed6b4-a776-a472-f0cb-43d1af16ebce reconnecting
Oct  5 11:44:20 fn1 kernel: Lustre: 3553:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 2 prev
ious similar messages
Oct  5 11:44:20 fn1 kernel: Lustre: 3553:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-OST0000
: refuse reconnection from a80ed6b4-a776-a472-f0cb-43d1af16ebce at 172.17.15.18@tcp to 0xffff81011959c0
00; still busy with 2 active RPCs
Oct  5 11:44:20 fn1 kernel: Lustre: 3553:0:(ldlm_lib.c:780:target_handle_connect()) Skipped 2 previo
us similar messages
Oct  5 11:44:41 fn1 kernel: LustreError: 3550:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff81011af54c50 x1348683852632051/t0 o8->a80ed6b4-a776-a472-f0cb-43d1af16ebce
@NET_0x20000ac110f12_UUID:0/0 lens 368/200 e 0 to 0 dl 1286293581 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 11:44:41 fn1 kernel: LustreError: 3550:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 4
previous similar messages
Oct  5 11:44:43 fn1 kernel: Lustre: Request x613043 sent from fn1home1-OST0000-osc-ffff8101ee411000
to NID 0 at lo 100s ago has timed out (limit 100s).
Oct  5 11:44:43 fn1 kernel: Lustre: Skipped 1 previous similar message
Oct  5 11:44:43 fn1 kernel: Lustre: fn1home1-OST0000-osc-ffff8101ee411000: Connection to service fn1
home1-OST0000 via nid 0 at lo was lost; in progress operations using this service will wait for recover
y to complete.
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 11:44:43 fn1 kernel: LustreError: 11-0: an error occurred while communicating with 0 at lo. The
ost_connect operation failed with -16
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
124 previous similar messages
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 124 previous similar messages
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
9 previous similar messages
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 11:44:43 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 9 previous similar messages
Oct  5 11:44:43 fn1 kernel: Lustre: Request x613079 sent from fn1home1-OST0000-osc-ffff8101ee411000
to NID 0 at lo 100s ago has timed out (limit 100s).
Oct  5 11:44:44 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 11:44:44 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
97 previous similar messages
Oct  5 11:44:44 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 11:44:44 fn1 kernel: LustreError: 9759:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 97 previous similar messages
Oct  5 11:44:44 fn1 kernel: Lustre: Request x613181 sent from fn1home1-OST0000-osc-ffff8101ee411000
to NID 0 at lo 100s ago has timed out (limit 100s).
Oct  5 11:44:44 fn1 kernel: LustreError: 3609:0:(filter.c:1229:filter_parent_lock()) fn1home1-OST000
0: slow parent lock 100s
Oct  5 11:44:44 fn1 kernel: Lustre: 3609:0:(service.c:1317:ptlrpc_server_handle_request()) @@@ Reque
st x1348683852632028 took longer than estimated (6+95s); client may timeout.  req at ffff810137723000 x
1348683852632028/t0 o101->a80ed6b4-a776-a472-f0cb-43d1af16ebce at NET_0x20000ac110f12_UUID:0/0 lens 296
/288 e 0 to 0 dl 1286293389 ref 1 fl Complete:/0/0 rc 0/0
Oct  5 11:44:45 fn1 kernel: Lustre: 3351:0:(service.c:1317:ptlrpc_server_handle_request()) @@@ Reque
st x613079 took longer than estimated (100+2s); client may timeout.  req at ffff81011d843c00 x613079/t0
 o103->12f4dd8a-1056-861c-2a01-70095593af9f at 172.17.15.20@tcp:0/0 lens 688/128 e 0 to 0 dl 1286293483
 ref 1 fl Complete:/0/0 rc 0/0
Oct  5 11:45:08 fn1 kernel: Lustre: 3152:0:(import.c:507:import_select_connection()) fn1home1-OST000
0-osc-ffff8101ee411000: tried all connections, increasing latency to 6s
Oct  5 11:45:08 fn1 kernel: Lustre: 3152:0:(import.c:507:import_select_connection()) Skipped 1 previ
ous similar message
Oct  5 11:45:08 fn1 kernel: Lustre: 3575:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-OST00
00: 12f4dd8a-1056-861c-2a01-70095593af9f reconnecting
Oct  5 11:45:08 fn1 kernel: Lustre: 3575:0:(ldlm_lib.c:541:target_handle_reconnect()) Skipped 5 prev
ious similar messages
Oct  5 11:45:08 fn1 kernel: Lustre: fn1home1-OST0000-osc-ffff8101ee411000: Connection restored to se
rvice fn1home1-OST0000 using nid 0 at lo.
Oct  5 11:45:08 fn1 kernel: Lustre: Skipped 1 previous similar message
Oct  5 12:26:36 fn1 kernel: Lustre: Request x858420 sent from fn1home1-MDT0000-mdc-ffff8101ee411000
to NID 0 at lo 100s ago has timed out (limit 100s).
Oct  5 12:26:36 fn1 kernel: Lustre: fn1home1-MDT0000-mdc-ffff8101ee411000: Connection to service fn1
home1-MDT0000 via nid 0 at lo was lost; in progress operations using this service will wait for recover
y to complete.
Oct  5 12:26:36 fn1 kernel: Lustre: 3419:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-MDT00
00: 12f4dd8a-1056-861c-2a01-70095593af9f reconnecting
Oct  5 12:26:36 fn1 kernel: Lustre: 3419:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-MDT0000
: refuse reconnection from 12f4dd8a-1056-861c-2a01-70095593af9f at 0@lo to 0xffff8101e9f32000; still bu
sy with 2 active RPCs
Oct  5 12:26:36 fn1 kernel: Lustre: 3419:0:(ldlm_lib.c:780:target_handle_connect()) Skipped 4 previo
us similar messages
Oct  5 12:26:36 fn1 kernel: LustreError: 3419:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff810115e7d450 x858794/t0 o38->12f4dd8a-1056-861c-2a01-70095593af9f at 172.17.1
5.20 at tcp:0/0 lens 304/200 e 0 to 0 dl 1286296096 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 12:26:36 fn1 kernel: LustreError: 3419:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 1
previous similar message
Oct  5 12:26:36 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:36 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
219 previous similar messages
Oct  5 12:26:36 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:36 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 219 previous similar messages
Oct  5 12:26:36 fn1 kernel: LustreError: 11-0: an error occurred while communicating with 0 at lo. The
mds_connect operation failed with -16
Oct  5 12:26:36 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:36 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:37 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:37 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:38 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:38 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
2 previous similar messages
Oct  5 12:26:38 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:38 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 2 previous similar messages
Oct  5 12:26:40 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:40 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
4 previous similar messages
Oct  5 12:26:40 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:40 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 4 previous similar messages
Oct  5 12:26:45 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:45 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
7 previous similar messages
Oct  5 12:26:45 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:45 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 7 previous similar messages
Oct  5 12:26:53 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -
11 from cancel RPC: canceling anyway
Oct  5 12:26:53 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped
18 previous similar messages
Oct  5 12:26:53 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cl
i_cancel_list: -11
Oct  5 12:26:53 fn1 kernel: LustreError: 3417:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped
 18 previous similar messages
Oct  5 12:27:01 fn1 kernel: Lustre: 3152:0:(import.c:507:import_select_connection()) fn1home1-MDT000
0-mdc-ffff8101ee411000: tried all connections, increasing latency to 6s
Oct  5 12:27:01 fn1 kernel: Lustre: 3408:0:(ldlm_lib.c:541:target_handle_reconnect()) fn1home1-MDT00
00: 12f4dd8a-1056-861c-2a01-70095593af9f reconnecting
Oct  5 12:27:01 fn1 kernel: Lustre: 3408:0:(ldlm_lib.c:780:target_handle_connect()) fn1home1-MDT0000
: refuse reconnection from 12f4dd8a-1056-861c-2a01-70095593af9f at 0@lo to 0xffff8101e9f32000; still bu
sy with 2 active RPCs
Oct  5 12:27:01 fn1 kernel: LustreError: 3408:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ proces
sing error (-16)  req at ffff810197228400 x858911/t0 o38->12f4dd8a-1056-861c-2a01-70095593af9f at 172.17.1
5.20 at tcp:0/0 lens 304/200 e 0 to 0 dl 1286296121 ref 1 fl Interpret:/0/0 rc -16/0
Oct  5 12:27:01 fn1 kernel: LustreError: 11-0: an error occurred while communicating with 0 at lo. The
mds_connect operation failed with -16
.
.
.

And so on...

What are these messages trying to tell me?

Denis Charland, ing. | P. Eng.
Administrateur de Systèmes UNIX | UNIX Systems Administrator
Tél. | tel. (450) 641-5078     Fax (450) 641-5106
Courriel | E-mail : denis.charland at cnrc-nrc.gc.ca<mailto:denis.charland at cnrc-nrc.gc.ca>

Institut des matériaux industriels | Industrial Materials Institute
Conseil national de recherches Canada | National Research Council Canada
75, de Mortagne, Boucherville, Québec, Canada, J4B 6Y4
Gouvernement du Canada | Government of Canada

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101005/321aa4b5/attachment.htm>


More information about the lustre-discuss mailing list