[Lustre-discuss] Timeouts and Dumps

Denise Hummel denise_hummel at nrel.gov
Fri Dec 19 07:42:16 PST 2008


Hi;

I have started getting numerous dump logs, timeouts and client
evictions.  Our environment:
200 compute nodes
The MDT/MGS is on a x86_64 server with 8G memory and 2 dual core AMD
procs
The OSS is on a x86_64 server with 8G memory and 2 dual core AMD procs
One OST raid 6 ~ 8TB (recently shrunk from 9TB)
Lustre 1.6.4.2

The timeouts are causing our login node to hang long enough to be
noticeable to the users before recovery.  I have attached a couple of
the logs and the syslog from this morning.  All of the logs and messages
are pretty similar.

If anyone has any suggestions, I would appreciate it.

Thanks,
Denise
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-log.1229661071.27314
Type: application/octet-stream
Size: 9840 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081219/55dcf04a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-log.1229685498.27300.gz
Type: application/x-gzip
Size: 797686 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081219/55dcf04a/attachment.bin>
-------------- next part --------------
Dec 19 04:17:28 oss1 kernel: Lustre: 27065:0:(router.c:167:lnet_notify()) Ignoring prediction from 172.16.100.41 at tcp of 172.16.2.47 at tcp down 11396654252 seconds in the future
Dec 19 04:17:28 oss1 kernel: LustreError: 27065:0:(socklnd.c:1599:ksocknal_destroy_conn()) Completing partial receive from 12345-172.16.2.47 at tcp, ip 172.16.2.47:1021, with error
Dec 19 04:17:28 oss1 kernel: LustreError: 27065:0:(socklnd.c:1599:ksocknal_destroy_conn()) Skipped 5 previous similar messages
Dec 19 04:17:28 oss1 kernel: LustreError: 27065:0:(events.c:307:server_bulk_callback()) event type 2, status -5, desc 00000101b3b2a000
Dec 19 04:17:28 oss1 kernel: LustreError: 27311:0:(ost_handler.c:1073:ost_brw_write()) @@@ network error on bulk GET 0(1048576)  req at 000001013b81ec00 x1689415/t0 o4->bece0307-a4e2-5aa1-3b02-673feb3c5e7e at NET_0x20000ac10022f_UUID:-1 lens 384/352 ref 0 fl Interpret:/0/0 rc 0/0
Dec 19 04:17:28 oss1 kernel: LustreError: 27311:0:(ost_handler.c:1073:ost_brw_write()) Skipped 5 previous similar messages
Dec 19 04:17:28 oss1 kernel: Lustre: 27311:0:(ost_handler.c:1205:ost_brw_write()) lustre-OST0000: ignoring bulk IO comm error with bece0307-a4e2-5aa1-3b02-673feb3c5e7e at NET_0x20000ac10022f_UUID id 12345-172.16.2.47 at tcp - client will retry
Dec 19 04:17:28 oss1 kernel: Lustre: 27311:0:(ost_handler.c:1205:ost_brw_write()) Skipped 15 previous similar messages
Dec 19 04:17:28 oss1 kernel: Lustre: 27138:0:(ldlm_lib.c:519:target_handle_reconnect()) lustre-OST0000: bece0307-a4e2-5aa1-3b02-673feb3c5e7e reconnecting
Dec 19 04:17:28 oss1 kernel: Lustre: 27138:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 20 previous similar messages
Dec 19 04:17:28 oss1 kernel: Lustre: 27138:0:(ldlm_lib.c:747:target_handle_connect()) lustre-OST0000: refuse reconnection from bece0307-a4e2-5aa1-3b02-673feb3c5e7e at 172.16.2.47@tcp to 0x00000101e69aa000; still busy with 8 active RPCs
Dec 19 04:17:28 oss1 kernel: Lustre: 27138:0:(ldlm_lib.c:747:target_handle_connect()) Skipped 1 previous similar message
Dec 19 04:17:28 oss1 kernel: LustreError: 27138:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-16)  req at 000001023c7bfe00 x1689445/t0 o8->bece0307-a4e2-5aa1-3b02-673feb3c5e7e at NET_0x20000ac10022f_UUID:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
Dec 19 04:17:40 oss1 kernel: Lustre: 27065:0:(router.c:167:lnet_notify()) Ignoring prediction from 172.16.100.41 at tcp of 172.16.2.47 at tcp down 11397159540 seconds in the future
Dec 19 04:18:18 oss1 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 27277: it was inactive for 100s
Dec 19 04:18:18 oss1 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Skipped 1 previous similar message
Dec 19 04:18:18 oss1 kernel: Lustre: 0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 27277
Dec 19 04:18:18 oss1 kernel: ll_ost_io_67  S 0000010218f55af8     0 27300      1         27301 27299 (L-TLB)
Dec 19 04:18:18 oss1 kernel: 0000010218f55948 0000000000000046 0000010140002030 ffffffff00000073 
Dec 19 04:18:18 oss1 kernel:        0000000000000001 00000000a02ba9f2 0000010140067a80 0000000300000004 
Dec 19 04:18:18 oss1 kernel:        000001021c220800 00000000000000a2 
Dec 19 04:18:18 oss1 kernel: Call Trace:<4>Lustre: 0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) Skipped 2 previous similar messages
Dec 19 04:18:18 oss1 kernel: ll_ost_io_44  S 0000010198895af8     0 27277      1         27278 27276 (L-TLB)
Dec 19 04:18:18 oss1 kernel: 0000010198895948 0000000000000046 0000010037ef1030 ffffffff00000073 
Dec 19 04:18:18 oss1 kernel:        0000000000000001 00000000a02ba9f2 000001014005fa80 0000000100000004 
Dec 19 04:18:18 oss1 kernel:        00000102380b8030 000000000000010b 
Dec 19 04:18:18 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer+293} <ffffffff8013f448>{__mod_timer+293} <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:18 oss1 kernel:        <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:18 oss1 kernel:        <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528} <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528}<ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187} 
Dec 19 04:18:18 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30} <ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:18 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} 
Dec 19 04:18:18 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:18 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27300
Dec 19 04:18:19 oss1 kernel:  <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30} <ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:19 oss1 kernel: ll_ost_io_93  S 000001016dafbaf8     0 27326      1         27327 27325 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 000001016dafb948 0000000000000046 00000102380b8030 ffffffff00000073 
Dec 19 04:18:19 oss1 kernel:        0000000000000001 00000000a02ba9f2 000001014005fa80 0000000100000004 
Dec 19 04:18:19 oss1 kernel:        00000101d1dd4800 00000000000000c8 
Dec 19 04:18:19 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer+293} <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30} <ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:19 oss1 kernel: ll_ost_io_89  S 00000101bbe01af8     0 27322      1         27323 27321 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 00000101bbe01948 0000000000000046 0000010037ef1030 ffffffff00000073 
Dec 19 04:18:19 oss1 kernel:        ll_ost_io_31  S 000001016257daf8     0 27264      1         27265 27263 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 000001016257d948 0000000000000046 0000010140002030 ffffffff00000073 
Dec 19 04:18:19 oss1 kernel:        0000000000000001 00000000a02ba9f2 0000010140067a80 0000000300000004 
Dec 19 04:18:19 oss1 kernel:        0000010239280800 0000000000000092 
Dec 19 04:18:19 oss1 kernel: Call Trace:0000000000000001 00000000a02ba9f2 000001014005fa80 0000000100000004 
Dec 19 04:18:19 oss1 kernel:        0000010227575800 00000000000000d5 
Dec 19 04:18:19 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer+293} <ffffffff8013f448>{__mod_timer+293} <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528} <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187}<ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187} 
Dec 19 04:18:19 oss1 kernel:         
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337}<ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30}  <ffffffff801335b7>{__wake_up_common+67}<ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456}  
Dec 19 04:18:19 oss1 kernel:        <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} <ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:19 oss1 kernel: ll_ost_io_01  S 0000010237121af8     0 27234      1         27235 27233 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 0000010237121948 0000000000000046 0000010227575800 ffffffff00000073 
Dec 19 04:18:19 oss1 kernel:        0000000000000001 00000000a02ba9f2 000001014005fa80 0000000100000004 
Dec 19 04:18:19 oss1 kernel:        0000010237fb5800 00000000000000c0 
Dec 19 04:18:19 oss1 kernel: Call Trace:<ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:19 oss1 kernel: <ffffffff8013f448>{__mod_timer+293} <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30} <ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:19 oss1 kernel: ll_ost_io_02  S 000001014d971af8     0 27235      1         27236 27234 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 000001014d971948 0000000000000046 0000010140002030 ffffffff00000073 
Dec 19 04:18:19 oss1 kernel:        0000000000000001 00000000a02ba9f2 0000010140067a80 0000000300000004 
Dec 19 04:18:19 oss1 kernel:        0000010237fb5030 000000000000009c 
Dec 19 04:18:19 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer+293} <ffffffff80321053>{schedule_timeout+367} 
Dec 19 04:18:19 oss1 kernel:        <ffffffff8013fe78>{process_timeout+0} <ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read+8528} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:20 oss1 kernel:        <ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69} 
Dec 19 04:18:20 oss1 kernel:        <ffffffffa0571590>{:ost:ost_bulk_timeout+0} <ffffffffa057c4ab>{:ost:ost_handle+12187} 
Dec 19 04:18:20 oss1 kernel:        <ffffffff80131923>{recalc_task_prio+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920} 
Dec 19 04:18:20 oss1 kernel:        <ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830} 
Dec 19 04:18:20 oss1 kernel:        <ffffffffa028c02e>{:libcfs:lcw_update_time+30} <ffffffff8013f448>{__mod_timer+293} 
Dec 19 04:18:20 oss1 kernel:        <ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456} <ffffffff80133566>{default_wake_function+0} 
Dec 19 04:18:20 oss1 kernel:        <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0} 
Dec 19 04:18:20 oss1 kernel:        <ffffffff80110de3>{child_rip+8} <ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0} 
Dec 19 04:18:20 oss1 kernel:        <ffffffff80110ddb>{child_rip+0} 
Dec 19 04:18:20 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27235
Dec 19 04:18:20 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27234
Dec 19 04:18:20 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27264
Dec 19 04:18:20 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27322
Dec 19 04:18:20 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27326
Dec 19 04:18:20 oss1 kernel: LustreError: dumping log to /tmp/lustre-log.1229685498.27277
Dec 19 04:20:48 oss1 kernel: LustreError: 27300:0:(ost_handler.c:1062:ost_brw_write()) @@@ timeout on bulk GET  req at 000001013b83c800 x1689418/t0 o4->bece0307-a4e2-5aa1-3b02-673feb3c5e7e at NET_0x20000ac10022f_UUID:-1 lens 384/352 ref 0 fl Interpret:/0/0 rc 0/0
Dec 19 04:20:48 oss1 kernel: LustreError: 27300:0:(ost_handler.c:1062:ost_brw_write()) Skipped 1 previous similar message
Dec 19 04:20:48 oss1 kernel: Lustre: 27300:0:(ost_handler.c:1205:ost_brw_write()) lustre-OST0000: ignoring bulk IO comm error with bece0307-a4e2-5aa1-3b02-673feb3c5e7e at NET_0x20000ac10022f_UUID id 12345-172.16.2.47 at tcp - client will retry
Dec 19 04:20:48 oss1 kernel: Lustre: 27300:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for pid 27300 disabled after 250.0272s
Dec 19 04:20:48 oss1 kernel: Lustre: 27300:0:(watchdog.c:312:lcw_update_time()) Skipped 1 previous similar message
Dec 19 04:25:04 oss1 kernel: Lustre: 27182:0:(ldlm_lib.c:519:target_handle_reconnect()) lustre-OST0000: 8789d360-6d52-0f7f-3558-f376de52aaf3 reconnecting
Dec 19 04:27:53 oss1 kernel: Lustre: 27173:0:(ldlm_lib.c:519:target_handle_reconnect()) lustre-OST0000: bece0307-a4e2-5aa1-3b02-673feb3c5e7e reconnecting
Dec 19 07:45:56 oss1 kernel: Lustre: 27065:0:(router.c:167:lnet_notify()) Ignoring prediction from 172.16.100.41 at tcp of 172.16.2.13 at tcp down 11521628444 seconds in the future
Dec 19 07:47:59 oss1 kernel: LustreError: 138-a: lustre-OST0000: A client on nid 172.16.2.13 at tcp was evicted due to a lock glimpse callback to 172.16.2.13 at tcp timed out: rc -107


More information about the lustre-discuss mailing list