[Lustre-discuss] Timeouts and Dumps

Denise Hummel denise_hummel at nrel.gov
Mon Dec 22 07:26:35 PST 2008


HI Isaac;

"   > Dec 19 04:17:28 oss1 kernel: Lustre:
27065:0:(router.c:167:lnet_notify()) Ignoring prediction from
172.16.100.41 at tcp of 172.16.2.47 at tcp down 11396654252 seconds in the
future

Please look in bug 14300 for a fix of this.

Isaac  "

I did not see where this bug would fix all the dumps I am getting -
about 50 a day now:


Dec 19 04:18:18 oss1 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 27277: it was inactive for 100s
Dec 19 04:18:18 oss1 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Skipped 1 previous similar message
Dec 19 04:18:18 oss1 kernel: Lustre:
0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
process 27277
Dec 19 04:18:18 oss1 kernel: ll_ost_io_67  S 0000010218f55af8     0
27300      1         27301 27299 (L-TLB)
Dec 19 04:18:18 oss1 kernel: 0000010218f55948 0000000000000046
0000010140002030 ffffffff00000073
Dec 19 04:18:18 oss1 kernel:        0000000000000001 00000000a02ba9f2
0000010140067a80 0000000300000004
Dec 19 04:18:18 oss1 kernel:        000001021c220800 00000000000000a2
Dec 19 04:18:18 oss1 kernel: Call Trace:<4>Lustre:
0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) Skipped 2 previous
similar messages
Dec 19 04:18:18 oss1 kernel: ll_ost_io_44  S 0000010198895af8     0
27277      1         27278 27276 (L-TLB)
Dec 19 04:18:18 oss1 kernel: 0000010198895948 0000000000000046
0000010037ef1030 ffffffff00000073
Dec 19 04:18:18 oss1 kernel:        0000000000000001 00000000a02ba9f2
000001014005fa80 0000000100000004
Dec 19 04:18:18 oss1 kernel:        00000102380b8030 000000000000010b
Dec 19 04:18:18 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer
+293} <ffffffff8013f448>{__mod_timer+293}
<ffffffff80321053>{schedule_timeout+367}
Dec 19 04:18:18 oss1 kernel:        <ffffffff80321053>{schedule_timeout
+367}
Dec 19 04:18:18 oss1 kernel:        <ffffffff8013fe78>{process_timeout
+0} <ffffffffa0576998>{:ost:ost_brw_write+7448}
Dec 19 04:18:18 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read
+8528} <ffffffff8013fe78>{process_timeout+0}
<ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:18 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read
+8528}<ffffffff80133566>{default_wake_function+0}
Dec 19 04:18:18 oss1 kernel:
<ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69}
Dec 19 04:18:18 oss1 kernel:
<ffffffffa0571590>{:ost:ost_bulk_timeout+0}
<ffffffffa057c4ab>{:ost:ost_handle+12187}
Dec 19 04:18:18 oss1 kernel:        <ffffffff80131923>{recalc_task_prio
+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920}
Dec 19 04:18:18 oss1 kernel:
<ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830}
Dec 19 04:18:18 oss1 kernel:
<ffffffffa028c02e>{:libcfs:lcw_update_time+30}
<ffffffff8013f448>{__mod_timer+293}
Dec 19 04:18:18 oss1 kernel:
<ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456}
<ffffffff80133566>{default_wake_function+0}
Dec 19 04:18:18 oss1 kernel:
<ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0}
Dec 19 04:18:18 oss1 kernel:        <ffffffff80110de3>{child_rip+8}
<ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0}
Dec 19 04:18:18 oss1 kernel:        <ffffffff80110ddb>{child_rip+0}
Dec 19 04:18:18 oss1 kernel: LustreError: dumping log
to /tmp/lustre-log.1229685498.27300
Dec 19 04:18:19 oss1 kernel:  <ffffffff80133566>{default_wake_function
+0}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa0571590>{:ost:ost_bulk_timeout+0}
<ffffffffa057c4ab>{:ost:ost_handle+12187}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio
+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa028c02e>{:libcfs:lcw_update_time+30}
<ffffffff8013f448>{__mod_timer+293}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456}
<ffffffff80133566>{default_wake_function+0}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8}
<ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0}
Dec 19 04:18:19 oss1 kernel: ll_ost_io_93  S 000001016dafbaf8     0
27326      1         27327 27325 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 000001016dafb948 0000000000000046
00000102380b8030 ffffffff00000073
Dec 19 04:18:19 oss1 kernel:        0000000000000001 00000000a02ba9f2
000001014005fa80 0000000100000004
Dec 19 04:18:19 oss1 kernel:        00000101d1dd4800 00000000000000c8
Dec 19 04:18:19 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer
+293} <ffffffff80321053>{schedule_timeout+367}
Dec 19 04:18:19 oss1 kernel:        <ffffffff8013fe78>{process_timeout
+0} <ffffffffa0576998>{:ost:ost_brw_write+7448}
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read
+8528} <ffffffff80133566>{default_wake_function+0}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa0571590>{:ost:ost_bulk_timeout+0}
<ffffffffa057c4ab>{:ost:ost_handle+12187}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80131923>{recalc_task_prio
+337} <ffffffffa02beb58>{:lnet:lnet_match_blocked_msg+920}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa03a699e>{:ptlrpc:ptlrpc_server_handle_request+2830}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa028c02e>{:libcfs:lcw_update_time+30}
<ffffffff8013f448>{__mod_timer+293}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa03a8fa8>{:ptlrpc:ptlrpc_main+2456}
<ffffffff80133566>{default_wake_function+0}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa03a7660>{:ptlrpc:ptlrpc_retry_rqbds+0}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110de3>{child_rip+8}
<ffffffffa03a8610>{:ptlrpc:ptlrpc_main+0}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80110ddb>{child_rip+0}
Dec 19 04:18:19 oss1 kernel: ll_ost_io_89  S 00000101bbe01af8     0
27322      1         27323 27321 (L-TLB)
Dec 19 04:18:19 oss1 kernel: 00000101bbe01948 0000000000000046
0000010037ef1030 ffffffff00000073
Dec 19 04:18:19 oss1 kernel:        0000000000000001 00000000a02ba9f2
0000010140067a80 0000000300000004
Dec 19 04:18:19 oss1 kernel:        0000010239280800 0000000000000092
Dec 19 04:18:19 oss1 kernel: Call Trace:0000000000000001
00000000a02ba9f2 000001014005fa80 0000000100000004
Dec 19 04:18:19 oss1 kernel:        0000010227575800 00000000000000d5
Dec 19 04:18:19 oss1 kernel: Call Trace:<ffffffff8013f448>{__mod_timer
+293} <ffffffff8013f448>{__mod_timer+293}
<ffffffff80321053>{schedule_timeout+367}
Dec 19 04:18:19 oss1 kernel:        <ffffffff80321053>{schedule_timeout
+367}
Dec 19 04:18:19 oss1 kernel:        <ffffffff8013fe78>{process_timeout
+0} <ffffffffa0576998>{:ost:ost_brw_write+7448}
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read
+8528} <ffffffff8013fe78>{process_timeout+0}
<ffffffffa0576998>{:ost:ost_brw_write+7448} 
Dec 19 04:18:19 oss1 kernel:        <ffffffffa0574ac0>{:ost:ost_brw_read
+8528} <ffffffff80133566>{default_wake_function+0}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa039e125>{:ptlrpc:lustre_msg_check_version+69}
Dec 19 04:18:19 oss1 kernel:
<ffffffffa0571590>{:ost:ost_bulk_timeout+0}
<ffffffffa057c4ab>{:ost:ost_handle
+12187}<ffffffff80133566>{default_wake_function+0}

... and much more of the same - it is in the log I sent with the
original email.  With these dumps I end up with timeouts and evictions
and noticeable hangs until recovery.
Does the fix for bug 14300 fix this issue also?

Thanks,
Denise




More information about the lustre-discuss mailing list