[Lustre-discuss] I/O error on clients

Peter Kitchener pkitchener at cmcrc.com
Mon Jul 5 23:19:10 PDT 2010


Hi all, 

I have been troubleshooting a strange problem that is occurring with our Lustre setup. Under high loads our developers are complaining that various processes they run will error out with I/O error. 

Our setup is small 1 MDS and 2 OSS(10OSTs 5/OSS), and 13 Clients (152 Cores) the storage is all local 60TB (30TB/OSS) usable in a RAID6 Software raid setup.  All of the machines are connected via 10Gig Ethernet. The clients run Rocks 5.3 (CentOS 5.4) and the Servers run CentOS 5.4 with kernel 2.6.18-164.11.1.el5_lustre.1.8.2.  The Clients run an un-patched vanilla kernel from CentOS and Lustre 1.8.3 

So far I've not been able to pin point where i should begin to look. I have been trawling through log files that quite frankly don't make much sense to me.

Here is the messages output from the OSS

##############################

Jul  6 14:57:11 helium kernel: Lustre: AC3-OST0005: haven't heard from client ce1a3eb7-8514-d16e-4050-0507e82f1116 (at 172.16.16.125 at tcp) in 227 seconds. I think it's dead, and I am evicting it.
Jul  6 15:08:26 helium kernel: Lustre: 6539:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0006: 593eb160-edce-8ceb-3f93-6f743cacd1a1 reconnecting
Jul  6 15:08:26 helium kernel: Lustre: 6539:0:(ldlm_lib.c:837:target_handle_connect()) AC3-OST0006: refuse reconnection from 593eb160-edce-8ceb-3f93-6f743cacd1a1 at 10.0.0.54@tcp to 0xffff81026241b800; still busy with 5 active RPCs
Jul  6 15:08:26 helium kernel: LustreError: 6539:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-16)  req at ffff810282fe1000 x1340041377953748/t0 o8->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 368/264 e 0 to 0 dl 1278393006 ref 1 fl Interpret:/0/0 rc -16/0
Jul  6 15:08:26 helium kernel: LustreError: 6660:0:(ost_handler.c:1061:ost_brw_write()) @@@ Reconnect on bulk GET  req at ffff810357883c00 x1340041377934618/t0 o4->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 448/416 e 0 to 0 dl 1278392947 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:08:26 helium kernel: LustreError: 6660:0:(ost_handler.c:1061:ost_brw_write()) Skipped 1 previous similar message
Jul  6 15:08:26 helium kernel: LustreError: 6704:0:(ost_handler.c:1061:ost_brw_write()) @@@ Reconnect on bulk GET  req at ffff81082af44050 x1340041377934964/t0 o4->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 448/416 e 0 to 0 dl 1278392947 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:08:27 helium kernel: Lustre: 7062:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0008: 593eb160-edce-8ceb-3f93-6f743cacd1a1 reconnecting
Jul  6 15:08:27 helium kernel: Lustre: 7062:0:(ldlm_lib.c:837:target_handle_connect()) AC3-OST0008: refuse reconnection from 593eb160-edce-8ceb-3f93-6f743cacd1a1 at 10.0.0.54@tcp to 0xffff81025a535e00; still busy with 3 active RPCs
Jul  6 15:08:27 helium kernel: LustreError: 7062:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-16)  req at ffff810698233850 x1340041377955630/t0 o8->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 368/264 e 0 to 0 dl 1278393007 ref 1 fl Interpret:/0/0 rc -16/0
Jul  6 15:08:27 helium kernel: Lustre: 6692:0:(ost_handler.c:1219:ost_brw_write()) AC3-OST0006: ignoring bulk IO comm error with 593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID id 12345-10.0.0.54 at tcp - client will retry
Jul  6 15:08:27 helium kernel: Lustre: 6692:0:(ost_handler.c:1219:ost_brw_write()) Skipped 6 previous similar messages
Jul  6 15:08:27 helium kernel: LustreError: 6720:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT  req at ffff8103a72aec00 x1340041377933615/t0 o3->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 448/400 e 0 to 0 dl 1278392946 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:08:27 helium kernel: LustreError: 6720:0:(ost_handler.c:829:ost_brw_read()) Skipped 1 previous similar message
Jul  6 15:08:29 helium kernel: Lustre: 6720:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0008: ignoring bulk IO comm error with 593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID id 12345-10.0.0.54 at tcp - client will retry
Jul  6 15:08:29 helium kernel: Lustre: 6720:0:(ost_handler.c:886:ost_brw_read()) Skipped 1 previous similar message
Jul  6 15:08:37 helium kernel: Lustre: 7058:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0006: 593eb160-edce-8ceb-3f93-6f743cacd1a1 reconnecting
Jul  6 15:08:39 helium kernel: Lustre: 6522:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0008: 593eb160-edce-8ceb-3f93-6f743cacd1a1 reconnecting
Jul  6 15:10:09 helium kernel: Lustre: 7064:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0008: e6ce7565-0b86-5925-82fc-015770c5143c reconnecting
Jul  6 15:10:09 helium kernel: Lustre: 7064:0:(ldlm_lib.c:837:target_handle_connect()) AC3-OST0008: refuse reconnection from e6ce7565-0b86-5925-82fc-015770c5143c at 10.0.0.53@tcp to 0xffff810267bbc400; still busy with 1 active RPCs
Jul  6 15:10:09 helium kernel: LustreError: 7064:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-16)  req at ffff81069588b800 x1340036504045855/t0 o8->e6ce7565-0b86-5925-82fc-015770c5143c at NET_0x200000a000035_UUID:0/0 lens 368/264 e 0 to 0 dl 1278393109 ref 1 fl Interpret:/0/0 rc -16/0
Jul  6 15:10:09 helium kernel: LustreError: 6649:0:(ost_handler.c:829:ost_brw_read()) @@@ Reconnect on bulk PUT  req at ffff81053a96f400 x1340036504013451/t0 o3->e6ce7565-0b86-5925-82fc-015770c5143c at NET_0x200000a000035_UUID:0/0 lens 448/400 e 0 to 0 dl 1278393053 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:10:09 helium kernel: LustreError: 6649:0:(ost_handler.c:829:ost_brw_read()) Skipped 2 previous similar messages
Jul  6 15:10:13 helium kernel: Lustre: 6649:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0008: ignoring bulk IO comm error with e6ce7565-0b86-5925-82fc-015770c5143c at NET_0x200000a000035_UUID id 12345-10.0.0.53 at tcp - client will retry
Jul  6 15:10:13 helium kernel: Lustre: 6649:0:(ost_handler.c:886:ost_brw_read()) Skipped 2 previous similar messages
Jul  6 15:10:17 helium kernel: Lustre: 7016:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0008: e6ce7565-0b86-5925-82fc-015770c5143c reconnecting
Jul  6 15:10:17 helium kernel: LustreError: 6708:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.0.0.54 at tcp  ns: filter-AC3-OST0006_UUID lock: ffff8103a644e200/0xded0540147d4c8c7 lrc: 3/0,0 mode: PR/PR res: 8432287/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x6c318a80ee850f9f expref: 878 pid: 7043 timeout 4896983243
Jul  6 15:10:17 helium kernel: LustreError: 6708:0:(ldlm_lockd.c:305:waiting_locks_callback()) Skipped 1 previous similar message
Jul  6 15:10:17 helium kernel: LustreError: 6715:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-107)  req at ffff81027fcc5400 x1340041378188643/t0 o4-><?>@<?>:0/0 lens 448/0 e 0 to 0 dl 1278393069 ref 1 fl Interpret:/0/0 rc -107/0
Jul  6 15:10:19 helium kernel: LustreError: 0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 10.0.0.54 at tcp  ns: filter-AC3-OST0008_UUID lock: ffff810489298400/0xded0540147d4c8b9 lrc: 3/0,0 mode: PR/PR res: 8127131/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0x6c318a80ee850fbb expref: 34 pid: 6592 timeout 4896985868
Jul  6 15:10:19 helium kernel: LustreError: 6730:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT  req at ffff810355aeec00 x1340041378188706/t0 o3->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 448/400 e 0 to 0 dl 1278393069 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:10:19 helium kernel: LustreError: 6665:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT  req at ffff81069e442400 x1340041378188736/t0 o3->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 448/400 e 0 to 0 dl 1278393069 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:10:19 helium kernel: LustreError: 6665:0:(ost_handler.c:825:ost_brw_read()) Skipped 5 previous similar messages
Jul  6 15:10:20 helium kernel: Lustre: 6730:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0008: ignoring bulk IO comm error with 593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID id 12345-10.0.0.54 at tcp - client will retry
Jul  6 15:10:20 helium kernel: LustreError: 6714:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-107)  req at ffff8106965e0800 x1340041378196268/t0 o3-><?>@<?>:0/0 lens 448/0 e 0 to 0 dl 1278393072 ref 1 fl Interpret:/0/0 rc -107/0
Jul  6 15:11:10 helium kernel: Lustre: 7119:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0009: d9a4606b-2d46-7e5b-c67d-c05610d8af95 reconnecting
Jul  6 15:18:00 helium kernel: Lustre: 6979:0:(ldlm_lib.c:540:target_handle_reconnect()) AC3-OST0009: e6ce7565-0b86-5925-82fc-015770c5143c reconnecting
Jul  6 15:30:40 helium kernel: Lustre: 7119:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850745218 sent from AC3-OST0006 to NID 10.0.0.54 at tcp 7s ago has timed out (7s prior to deadline).
Jul  6 15:30:40 helium kernel:   req at ffff8104fd018000 x1339860850745218/t0 o104->@NET_0x200000a000036_UUID:15/16 lens 296/384 e 0 to 1 dl 1278394240 ref 2 fl Rpc:N/0/0 rc 0/0
Jul  6 15:30:40 helium kernel: LustreError: 138-a: AC3-OST0006: A client on nid 10.0.0.54 at tcp was evicted due to a lock blocking callback to 10.0.0.54 at tcp timed out: rc -107
Jul  6 15:30:40 helium kernel: LustreError: 7119:0:(ldlm_lockd.c:1167:ldlm_handle_enqueue()) ### lock on destroyed export ffff8102688f7400 ns: filter-AC3-OST0006_UUID lock: ffff81012bdcb200/0xded0540147d6bb48 lrc: 3/0,0 mode: --/PW res: 8432457/0 rrc: 2 type: EXT [0->376831] (req 0->376831) flags: 0x0 remote: 0x6c318a80f18346f1 expref: 357 pid: 7119 timeout 0
Jul  6 15:30:40 helium kernel: LustreError: 7119:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-107)  req at ffff8103870e8400 x1340041383514587/t0 o101->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 296/352 e 0 to 0 dl 1278394284 ref 1 fl Interpret:/0/0 rc -107/0
Jul  6 15:30:41 helium kernel: LustreError: 6679:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT  req at ffff81082a335c50 x1340041383502349/t0 o3->593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID:0/0 lens 448/400 e 0 to 0 dl 1278394307 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:30:41 helium kernel: Lustre: 6679:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0006: ignoring bulk IO comm error with 593eb160-edce-8ceb-3f93-6f743cacd1a1 at NET_0x200000a000036_UUID id 12345-10.0.0.54 at tcp - client will retry
Jul  6 15:30:41 helium kernel: Lustre: 6679:0:(ost_handler.c:886:ost_brw_read()) Skipped 6 previous similar messages
Jul  6 15:30:41 helium kernel: LustreError: 8166:0:(ldlm_lockd.c:1821:ldlm_cancel_handler()) operation 103 from 12345-10.0.0.54 at tcp with bad export cookie 16055425036052657553
Jul  6 15:32:08 helium kernel: Lustre: 7134:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850745435 sent from AC3-OST0005 to NID 10.0.0.56 at tcp 7s ago has timed out (7s prior to deadline).
Jul  6 15:32:08 helium kernel:   req at ffff810695874000 x1339860850745435/t0 o104->@:15/16 lens 296/384 e 0 to 1 dl 1278394328 ref 2 fl Rpc:N/0/0 rc 0/0
Jul  6 15:32:08 helium kernel: LustreError: 138-a: AC3-OST0005: A client on nid 10.0.0.56 at tcp was evicted due to a lock blocking callback to 10.0.0.56 at tcp timed out: rc -107
Jul  6 15:32:08 helium kernel: LustreError: 7134:0:(ldlm_lockd.c:1167:ldlm_handle_enqueue()) ### lock on destroyed export ffff8102552f5400 ns: filter-AC3-OST0005_UUID lock: ffff810485fade00/0xded0540147d6ead5 lrc: 3/0,0 mode: --/PW res: 8108669/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x0 remote: 0xd1f8ae5995b5c83b expref: 620 pid: 7134 timeout 0
Jul  6 15:32:08 helium kernel: LustreError: 7134:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error (-107)  req at ffff81009ddd6800 x1340491441560517/t0 o101->8a1862dd-8de6-9414-f71c-0c85925e1e20@:0/0 lens 296/352 e 0 to 0 dl 1278394372 ref 1 fl Interpret:/0/0 rc -107/0
Jul  6 15:32:08 helium kernel: LustreError: 6610:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT  req at ffff810357887c00 x1340491441559786/t0 o3->8a1862dd-8de6-9414-f71c-0c85925e1e20@:0/0 lens 448/400 e 0 to 0 dl 1278394396 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:32:08 helium kernel: LustreError: 6610:0:(ost_handler.c:825:ost_brw_read()) Skipped 3 previous similar messages
Jul  6 15:32:08 helium kernel: Lustre: 6610:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0005: ignoring bulk IO comm error with 8a1862dd-8de6-9414-f71c-0c85925e1e20@ id 12345-10.0.0.56 at tcp - client will retry
Jul  6 15:32:08 helium kernel: Lustre: 6610:0:(ost_handler.c:886:ost_brw_read()) Skipped 3 previous similar messages
Jul  6 15:32:08 helium kernel: LustreError: 6720:0:(ost_handler.c:825:ost_brw_read()) @@@ Eviction on bulk PUT  req at ffff81040de33000 x1340491441560035/t0 o3->8a1862dd-8de6-9414-f71c-0c85925e1e20@:0/0 lens 448/400 e 0 to 0 dl 1278394396 ref 1 fl Interpret:/0/0 rc 0/0
Jul  6 15:32:16 helium kernel: LustreError: 12001:0:(ldlm_lockd.c:1821:ldlm_cancel_handler()) operation 103 from 12345-10.0.0.56 at tcp with bad export cookie 16055425036052176730
Jul  6 15:32:19 helium kernel: Lustre: 6720:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0005: ignoring bulk IO comm error with 8a1862dd-8de6-9414-f71c-0c85925e1e20@ id 12345-10.0.0.56 at tcp - client will retry
Jul  6 15:32:21 helium kernel: Lustre: 6712:0:(ost_handler.c:886:ost_brw_read()) AC3-OST0005: ignoring bulk IO comm error with 8a1862dd-8de6-9414-f71c-0c85925e1e20@ id 12345-10.0.0.56 at tcp - client will retry
Jul  6 15:32:21 helium kernel: Lustre: 6712:0:(ost_handler.c:886:ost_brw_read()) Skipped 2 previous similar messages
Jul  6 15:36:56 helium kernel: Lustre: 6970:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746067 sent from AC3-OST0006 to NID 10.0.0.57 at tcp 11s ago has timed out (11s prior to deadline).
Jul  6 15:36:56 helium kernel:   req at ffff810086c83000 x1339860850746067/t0 o106->@:15/16 lens 296/424 e 0 to 1 dl 1278394616 ref 2 fl Rpc:/0/0 rc 0/0
Jul  6 15:37:01 helium diskmond: 168:Polling all 48 slots for drive fault 
Jul  6 15:37:10 helium diskmond: sata4/5 device(/dev/sdal) is running bad 
Jul  6 15:37:10 helium diskmond: please back up and replace the disk soon. 
Jul  6 15:38:05 helium kernel: Lustre: 6537:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746176 sent from AC3-OST0005 to NID 10.0.0.57 at tcp 8s ago has timed out (8s prior to deadline).
Jul  6 15:38:05 helium kernel:   req at ffff810341fdc800 x1339860850746176/t0 o106->@:15/16 lens 296/424 e 0 to 1 dl 1278394685 ref 2 fl Rpc:/0/0 rc 0/0
Jul  6 15:38:18 helium kernel: Lustre: 6524:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746195 sent from AC3-OST0008 to NID 10.0.0.57 at tcp 7s ago has timed out (7s prior to deadline).
Jul  6 15:38:18 helium kernel:   req at ffff810282961400 x1339860850746195/t0 o106->@:15/16 lens 296/424 e 0 to 1 dl 1278394698 ref 2 fl Rpc:/0/0 rc 0/0
Jul  6 15:39:07 helium kernel: Lustre: 6793:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746269 sent from AC3-OST0006 to NID 10.0.0.57 at tcp 11s ago has timed out (11s prior to deadline).
Jul  6 15:39:07 helium kernel:   req at ffff81027ce5b400 x1339860850746269/t0 o106->@:15/16 lens 296/424 e 0 to 1 dl 1278394747 ref 2 fl Rpc:/0/0 rc 0/0
Jul  6 15:40:19 helium kernel: Lustre: 6851:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746437 sent from AC3-OST0005 to NID 10.0.0.57 at tcp 8s ago has timed out (8s prior to deadline).
Jul  6 15:40:19 helium kernel:   req at ffff8104135bb800 x1339860850746437/t0 o106->@:15/16 lens 296/424 e 0 to 1 dl 1278394819 ref 2 fl Rpc:/0/0 rc 0/0
Jul  6 15:42:10 helium kernel: Lustre: 6935:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746615 sent from AC3-OST0005 to NID 10.0.0.53 at tcp 7s ago has timed out (7s prior to deadline).
Jul  6 15:42:10 helium kernel:   req at ffff81025b227c00 x1339860850746615/t0 o106->@NET_0x200000a000035_UUID:15/16 lens 296/424 e 0 to 1 dl 1278394930 ref 2 fl Rpc:/0/0 rc 0/0
Jul  6 15:46:01 helium kernel: Lustre: 6957:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request x1339860850746990 sent from AC3-OST0007 to NID 172.16.16.124 at tcp 7s ago has timed out (7s prior to deadline).
Jul  6 15:46:01 helium kernel:   req at ffff8103fabd9800 x1339860850746990/t0 o106->@NET_0x20000ac10107c_UUID:15/16 lens 296/424 e 0 to 1 dl 1278395161 ref 2 fl Rpc:/0/0 rc 0/0


########################

Here is the output from the client at the same time

Jul  6 15:10:17 compute-0-3 kernel: LustreError: 11-0: an error occurred while communicating with 172.16.16.2 at tcp. The ost_write operation failed with -107
Jul  6 15:10:17 compute-0-3 kernel: LustreError: Skipped 1 previous similar message
Jul  6 15:10:17 compute-0-3 kernel: LustreError: 167-0: This client was evicted by AC3-OST0006; in progress operations using this service will fail.
Jul  6 15:10:17 compute-0-3 kernel: LustreError: Skipped 4 previous similar messages
Jul  6 15:10:17 compute-0-3 kernel: LustreError: 3095:0:(namei.c:1176:ll_objects_destroy()) obd destroy objid 0x18542c4 at 0x0 error -5
Jul  6 15:10:17 compute-0-3 kernel: LustreError: 6419:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
Jul  6 15:10:18 compute-0-3 kernel: LustreError: 6779:0:(ldlm_resource.c:518:ldlm_namespace_cleanup()) Namespace AC3-OST0006-osc-ffff81043e754c00 resource r
efcount nonzero (1) after lock cleanup; forcing cleanup.
Jul  6 15:10:18 compute-0-3 kernel: LustreError: 6779:0:(ldlm_resource.c:523:ldlm_namespace_cleanup()) Resource: ffff810117997500 (8432287/0/0/0) (rc: 1)
Jul  6 15:10:18 compute-0-3 kernel: LustreError: 6687:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5
Jul  6 15:10:18 compute-0-3 kernel: LustreError: 6782:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 11-0: an error occurred while communicating with 172.16.16.2 at tcp. The ost_read operation failed with -107
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 167-0: This client was evicted by AC3-OST0008; in progress operations using this service will fail.
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 20660:0:(rw.c:122:ll_brw()) error from obd_brw: rc = -4
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 6784:0:(ldlm_resource.c:518:ldlm_namespace_cleanup()) Namespace AC3-OST0008-osc-ffff81043e754c00 resource r
efcount nonzero (1) after lock cleanup; forcing cleanup.
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 6784:0:(ldlm_resource.c:523:ldlm_namespace_cleanup()) Resource: ffff81041604b9c0 (5521743/0/0/0) (rc: 1)
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 6686:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 3571:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at ffff81042061a800 x1340041378196315/t0 
o4->AC3-OST0008_UUID at 172.16.16.2@tcp:6/4 lens 512/624 e 0 to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0Jul  6 15:10:20 compute-0-3 kernel: LustreError: 3571:0:(client.c:858:ptlrpc_import_delay_req()) Skipped 78 previous similar messages
Jul  6 15:10:20 compute-0-3 kernel: LustreError: 6785:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO

Kind Regards,

Peter Kitchener
Systems Administrator
Capital Markets CRC Limited (CMCRC)
Telephone: +61 2 8088 4223
Fax: +61 2 8088 4201
www.cmcrc.com 

 




Capital Markets CRC Ltd - Confidential Communication
The information contained in this e-mail is confidential.  It is intended for the addressee only.  If you receive this e-mail by mistake please promptly inform us by reply e-mail and then delete the e-mail and destroy any printed copy. You must not disclose or use in any way the information in the e-mail. There is no warranty that this e-mail is error or virus free. It may be a private communication, and if so, does not represent the views of the CMCRC and its associates.  If it is a private communication, care should be taken in opening it to ensure that undue offence is not given.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100706/783cbd27/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 13489 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100706/783cbd27/attachment.png>


More information about the lustre-discuss mailing list