[Lustre-discuss] Why are there many threads named "ll_imp_inval" in client?
huangql
huangql at ihep.ac.cn
Mon Aug 17 00:23:31 PDT 2009
Hi, all
Our system run well past two weeks, However, we found there are some computing nodes which has so many threads named "ll_imp_inval", and the load average of the clients(computing nodes) is up to 28. As a results, Users can't submit jobs to the clients. I read the source file(import.c) and In my opinion, when each ptlrpc-connect-import or ptlrpc-import-recovery, the ll_imp_inval thread is triggered. So if the server or clients have something wrong, the thread will not exit. Is it right?
we run 'ps -aux | grep ll_imp_inval' ,the results as follows:
root 22568 0.0 0.0 0 0 ? D Aug13 0:00 [ll_imp_inval]
root 22569 0.0 0.0 0 0 ? D Aug13 0:00 [ll_imp_inval]
root 22570 0.0 0.0 0 0 ? D Aug13 0:00 [ll_imp_inval]
root 22571 0.0 0.0 0 0 ? D Aug13 0:00 [ll_imp_inval]
...
We had check out the log, and found the main messages as follows,and in other nodes we can get the client evicted messages:
Aug 13 08:57:55 bws0211 kernel: Lustre: Request x5103879 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 08:57:55 bws0211 kernel: Lustre: Skipped 6 previous similar messages
Aug 13 08:58:34 bws0211 kernel: Lustre: testfs-OST0018-osc-f7dcfe00: Connection restored to service testfs-OST0018 using nid 192.168.50.79 at tcp.
Aug 13 09:00:00 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 09:00:00 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 5 previous similar messages
Aug 13 09:00:00 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:00:00 bws0211 kernel: LustreError: Skipped 2 previous similar messages
Aug 13 09:00:00 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001c; in progress operations using this service will fail.
Aug 13 09:00:00 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection restored to service testfs-OST001c using nid 192.168.50.80 at tcp.
Aug 13 09:00:00 bws0211 kernel: Lustre: Skipped 2 previous similar messages
Aug 13 09:02:05 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:02:05 bws0211 kernel: LustreError: Skipped 1 previous similar message
Aug 13 09:06:15 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:06:15 bws0211 kernel: LustreError: Skipped 3 previous similar messages
Aug 13 09:10:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 26s
Aug 13 09:10:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 10 previous similar messages
Aug 13 09:12:30 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:12:30 bws0211 kernel: LustreError: Skipped 5 previous similar messages
Aug 13 09:20:50 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 09:20:50 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 09:22:55 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:22:55 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 09:31:15 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 09:31:15 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 09:33:20 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:33:20 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 09:41:40 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 09:41:40 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 09:43:45 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:43:45 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 09:52:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 09:52:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 09:54:10 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 09:54:10 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 10:02:30 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 10:02:30 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 10:04:35 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 10:04:35 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 10:12:55 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 10:12:55 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 10:15:00 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 10:15:00 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 10:23:20 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 10:23:20 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 10:25:25 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 10:25:25 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 10:33:45 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 10:33:45 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 10:35:50 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 10:35:50 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 10:44:10 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 10:44:10 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 10:46:15 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 10:46:15 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 10:54:35 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 10:54:35 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 10:56:40 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 10:56:40 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 11:05:00 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 11:05:00 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 11:07:05 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 11:07:05 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 11:15:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 11:15:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 11:17:30 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 11:17:30 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 11:25:50 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 11:25:50 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 11:27:55 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 11:27:55 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 11:36:15 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 11:36:15 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 11:38:20 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 11:38:20 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 11:46:40 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 11:46:40 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 11:48:45 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 11:48:45 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 11:57:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 11:57:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 12:01:15 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The obd_ping operation failed with -107
Aug 13 12:01:15 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 12:01:15 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection to service testfs-OST001c via nid 192.168.50.80 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Aug 13 12:01:15 bws0211 kernel: Lustre: Skipped 2 previous similar messages
Aug 13 12:01:15 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001c; in progress operations using this service will fail.
Aug 13 12:01:15 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection restored to service testfs-OST001c using nid 192.168.50.80 at tcp.
Aug 13 12:07:30 bws0211 kernel: Lustre: Request x5245865 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 12:07:30 bws0211 kernel: Lustre: Skipped 2 previous similar messages
Aug 13 12:09:35 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 12:09:35 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages
Aug 13 12:09:35 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection to service testfs-OST001c via nid 192.168.50.80 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Aug 13 12:09:35 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001c; in progress operations using this service will fail.
Aug 13 12:09:35 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection restored to service testfs-OST001c using nid 192.168.50.80 at tcp.
Aug 13 12:17:55 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The obd_ping operation failed with -107
Aug 13 12:17:55 bws0211 kernel: LustreError: Skipped 3 previous similar messages
Aug 13 12:17:55 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection to service testfs-OST001c via nid 192.168.50.80 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Aug 13 12:17:55 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001c; in progress operations using this service will fail.
Aug 13 12:17:55 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection restored to service testfs-OST001c using nid 192.168.50.80 at tcp.
Aug 13 12:20:00 bws0211 kernel: Lustre: Request x5254534 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 12:20:00 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 12:22:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 11s
Aug 13 12:22:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages
Aug 13 12:28:20 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 12:28:20 bws0211 kernel: LustreError: Skipped 6 previous similar messages
Aug 13 12:32:30 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 36s
Aug 13 12:32:30 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 12:38:45 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 12:38:45 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 12:42:55 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 12:42:55 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 12:49:10 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The obd_ping operation failed with -107
Aug 13 12:49:10 bws0211 kernel: LustreError: Skipped 5 previous similar messages
Aug 13 12:49:10 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection to service testfs-OST001c via nid 192.168.50.80 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Aug 13 12:49:10 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001c; in progress operations using this service will fail.
Aug 13 12:49:10 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection restored to service testfs-OST001c using nid 192.168.50.80 at tcp.
Aug 13 12:53:20 bws0211 kernel: Lustre: Request x5277671 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 12:53:20 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 12:55:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 12:55:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages
Aug 13 12:57:30 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection to service testfs-OST001c via nid 192.168.50.80 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Aug 13 12:57:30 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001c; in progress operations using this service will fail.
Aug 13 12:57:30 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection restored to service testfs-OST001c using nid 192.168.50.80 at tcp.
Aug 13 13:03:45 bws0211 kernel: Lustre: Request x5283965 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 13:03:45 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 13:05:50 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 13:05:50 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 1 previous similar message
Aug 13 13:05:50 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 13:05:50 bws0211 kernel: LustreError: Skipped 1 previous similar message
Aug 13 13:16:15 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 26s
Aug 13 13:16:15 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 13:16:15 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 13:16:15 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 13:26:40 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 13:26:40 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 13:26:40 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 13:26:40 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 13:37:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 13:37:05 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 13:37:05 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 13:37:05 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 13:47:30 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 26s
Aug 13 13:47:30 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 13:47:30 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 13:47:30 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 13:57:55 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 13:57:55 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 13:57:55 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 13:57:55 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 14:08:20 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 14:08:20 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 14:08:20 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 14:08:20 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 14:18:45 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 14:18:45 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 14:18:45 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 14:18:45 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 14:29:10 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 51s
Aug 13 14:29:10 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 14:29:10 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 14:29:10 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 14:39:35 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 14:39:35 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 14:39:35 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 14:39:35 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 14:50:00 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 31s
Aug 13 14:50:00 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 9 previous similar messages
Aug 13 14:58:20 bws0211 kernel: Lustre: Request x5363068 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 14:58:20 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 14:58:20 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The obd_ping operation failed with -107
Aug 13 14:58:20 bws0211 kernel: LustreError: Skipped 9 previous similar messages
Aug 13 14:58:20 bws0211 kernel: Lustre: testfs-OST001c-osc-f7dcfe00: Connection to service testfs-OST001c via nid 192.168.50.80 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Aug 13 14:58:22 bws0211 kernel: Lustre: Request x5363070 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 502s ago has timed out (limit 500s).
Aug 13 14:58:22 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 15:00:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 36s
Aug 13 15:00:25 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 1 previous similar message
Aug 13 15:00:27 bws0211 kernel: Lustre: Request x5364501 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 502s ago has timed out (limit 500s).
Aug 13 15:02:34 bws0211 kernel: Lustre: Request x5365911 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 504s ago has timed out (limit 500s).
Aug 13 15:04:35 bws0211 kernel: Lustre: Request x5367371 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:08:45 bws0211 kernel: Lustre: Request x5370730 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:10:51 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 41s
Aug 13 15:10:51 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 1 previous similar message
Aug 13 15:19:11 bws0211 kernel: Lustre: Request x5377804 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:19:11 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 15:21:16 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 46s
Aug 13 15:21:16 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 1 previous similar message
Aug 13 15:23:20 bws0211 kernel: Lustre: Request x5369047 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 1500s ago has timed out (limit 1500s).
Aug 13 15:23:20 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 15:29:36 bws0211 kernel: Lustre: Request x5384919 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:31:40 bws0211 kernel: Lustre: Request x5386289 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:31:40 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 15:31:41 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001c-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 15:31:41 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages
Aug 13 15:37:56 bws0211 kernel: Lustre: Request x5390694 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:40:01 bws0211 kernel: Lustre: Request x5393807 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:40:01 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 15:42:06 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001c-osc-f7dcfe00: tried all connections, increasing latency to 11s
Aug 13 15:42:06 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 2 previous similar messages
Aug 13 15:46:16 bws0211 kernel: Lustre: Request x5412456 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:50:26 bws0211 kernel: Lustre: Request x5425369 sent from testfs-OST001c-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 15:50:26 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 15:52:31 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001c-osc-f7dcfe00: tried all connections, increasing latency to 16s
Aug 13 15:52:31 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 2 previous similar messages
Aug 13 15:56:41 bws0211 kernel: Lustre: Request x5444459 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 16:02:56 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001c-osc-f7dcfe00: tried all connections, increasing latency to 21s
Aug 13 16:02:56 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 2 previous similar messages
Aug 13 16:05:01 bws0211 kernel: Lustre: Request x5469816 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 16:05:01 bws0211 kernel: Lustre: Skipped 2 previous similar messages
Aug 13 16:07:06 bws0211 kernel: LustreError: 4236:0:(import.c:756:ptlrpc_connect_interpret()) testfs-OST001a_UUID went back in time (transno 22127117 was previously committed, server now claims 22127112)! See https://bugzilla.clusterfs.com/long_list.cgi?buglist=9646
Aug 13 16:07:06 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 16:09:11 bws0211 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.50.80 at tcp. The ost_connect operation failed with -19
Aug 13 16:13:21 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001b-osc-f7dcfe00: tried all connections, increasing latency to 16s
Aug 13 16:13:21 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 4 previous similar messages
Aug 13 16:21:41 bws0211 kernel: Lustre: Request x5519899 sent from testfs-OST001b-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 16:21:41 bws0211 kernel: Lustre: Skipped 2 previous similar messages
Aug 13 16:23:46 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001b-osc-f7dcfe00: tried all connections, increasing latency to 21s
Aug 13 16:23:46 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 1 previous similar message
Aug 13 16:32:06 bws0211 kernel: Lustre: Request x5501056 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 1500s ago has timed out (limit 1500s).
Aug 13 16:32:06 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 16:40:26 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 6s
Aug 13 16:40:26 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages
Aug 13 16:48:46 bws0211 kernel: Lustre: Request x5599100 sent from testfs-OST001a-osc-f7dcfe00 to NID 192.168.50.80 at tcp 500s ago has timed out (limit 500s).
Aug 13 16:48:46 bws0211 kernel: Lustre: Skipped 5 previous similar messages
Aug 13 16:55:03 bws0211 kernel: Lustre: setting import testfs-OST001b_UUID INACTIVE by administrator request
Aug 13 16:55:03 bws0211 kernel: Lustre: Skipped 1 previous similar message
Aug 13 16:55:03 bws0211 kernel: Lustre: testfs-OST001b-osc-f7dcfe00.osc: set parameter active=0
Aug 13 16:55:03 bws0211 kernel: Lustre: Skipped 8 previous similar messages
Aug 13 16:57:06 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) testfs-OST001a-osc-f7dcfe00: tried all connections, increasing latency to 16s
Aug 13 16:57:06 bws0211 kernel: Lustre: 4237:0:(import.c:395:import_select_connection()) Skipped 5 previous similar messages
Aug 13 16:57:06 bws0211 kernel: LustreError: 167-0: This client was evicted by testfs-OST001a; in progress operations using this service will fail.
Thank you for your help in advance and I hope receive your letter as soon as possible.
Best wishes,
Sarea
2009-08-17
huangql
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090817/6d06b39c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1841 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090817/6d06b39c/attachment.gif>
More information about the lustre-discuss
mailing list