[Lustre-discuss] lustre-1.6.5.1 kernel panic
Wojciech Turek
wjt27 at cam.ac.uk
Thu Sep 25 11:35:17 PDT 2008
Hi,
I upgraded our test lustre file system to the latest 1.6.5.1 version
available from the SUN website.
I have one OSS with one OST and one MDS with combined MGS and MDT
Both servers are running RHEL4 x86_64 and
2.6.9-67.0.7.EL_lustre.1.6.5.1smp kernel, the interconnect is infiniband
and I am using ib modules provided with lustre.
When I mount filesystem and then start writing to it OSS crashes with
kernel panic, see log below:
Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 17398:
it was inactive for 200s
Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack
for process 17397
ll_ost_io_92 D 0000000000000002 0 17397 1 17398 17396
(L-TLB)
00000101156bf538 0000000000000046 00000101956bf616 ffffffff801ece0f
0000000000000000 ffffff0010776340 000001010e14c6c0 0000000100000001
0000010113f90030 000000000012b585
Call Trace:ll_ost_io_82 D 000001012ab79400 0 17387 1
17388 17386 (L-TLB)
000001011c252d88 0000000000000046 ffffffffa000288c 0000010115b213c0
0000000000000246 00000100cf851c00 000001012bafa940 0000000200000000
000001010f71f030 0000000000000814
Call Trace:<ffffffffa000288c>{:scsi_mod:scsi_done+0}
<ffffffff801ece0f>{vsnprintf+1406} <ffffffff8024f658>{elv_next_request+238}
<ffffffffa0007df8>{:scsi_mod:scsi_request_fn+1100}
<ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff8030e2f6>{io_schedule+38}
<ffffffff80179e24>{__wait_on_buffer+125}
<ffffffff80179caa>{bh_wake_function+0}
<ffffffff80179caa>{bh_wake_function+0}
<ffffffffa07cad2b>{:ldiskfs:ldiskfs_mb_init_cache+635}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffff80158b8e>{add_to_page_cache+167}
<ffffffffa07cb450>{:ldiskfs:ldiskfs_mb_load_buddy+304}
<ffffffffa07cc7b4>{:ldiskfs:ldiskfs_mb_regular_allocator+1028}
<4>Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for
pid 17388: it was inactive for 200s
Lustre: 0:0:(watchdog.c:130:lcw_cb()) Skipped 2 previous similar messages
Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack
for process 17388
Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) Skipped 1
previous similar message
ll_ost_io_83 D 0000000000000002 0 17388 1 17389 17387
(L-TLB)
000001011c35f538 0000000000000046 000001019c35f616 ffffffff801ece0f
0000000000000000 ffffff0010776508 000001010e14c6c0 0000000000000001
000001011c336800 00000000000de7ad
Call Trace:<ffffffff801351dc>{autoremove_wake_function+0}
<ffffffffa07d10fd>{:ldiskfs:ldiskfs_mb_new_blocks+333}
<ffffffff801351dc>{autoremove_wake_function+0}
<ffffffffa0814eb4>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+884}
<ffffffffa028e06a>{:ib_ipath:ipath_verbs_send+1883}
<ffffffffa07c6d7f>{:ldiskfs:ldiskfs_ext_find_extent+255}
<ffffffffa07c8972>{:ldiskfs:ldiskfs_ext_walk_space+482}
<ffffffffa0814b40>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0}
<ffffffffa0815343>{:fsfilt_ldiskfs:fsfilt_map_nblocks+307}
<ffffffffa028560f>{:ib_ipath:ipath_do_send+1852}
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa08155bb>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+539}
<ffffffff801ece0f>{vsnprintf+1406}
<ffffffff801ece0f>{vsnprintf+1406}
<ffffffffa067cb38>{:ko2iblnd:kiblnd_check_sends+2040}
<ffffffffa08410b4>{:obdfilter:filter_direct_io+1108}
<ffffffffa01ba524>{:jbd:journal_start+223}
<ffffffffa081384b>{:fsfilt_ldiskfs:fsfilt_ldiskfs_brw_start+763}
<ffffffffa0842c1d>{:obdfilter:filter_commitrw_write+4957}
<ffffffffa04f5539>{:lvfs:pop_ctxt+505}
<ffffffff8030e4ce>{schedule_timeout+411}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffffa07f87ee>{:ost:ost_checksum_bulk+558}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff80133804>{default_wake_function+0}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffff8015c830>{__rmqueue+218}
<ffffffff8015c830>{__rmqueue+218}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffff80110de3>{child_rip+8} <ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
<ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
ll_ost_io_84 D 0000000000000002 0 17389 1 17390 17388
(L-TLB)
000001011ac6d538 0000000000000046 000001019ac6d616 ffffffff801ece0f
0000000000000000 ffffff0010776340 000001010e14c6c0 0000000100000001
000001011c336030 000000000012c37b
Call Trace:<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff801ece0f>{vsnprintf+1406}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffff80131bc7>{recalc_task_prio+337}
<ffffffffa05160a0>{:lnet:lnet_send+2544}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff8030e4ce>{schedule_timeout+411}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffffa0516b48>{:lnet:lnet_match_blocked_msg+920}
<ffffffff8015c830>{__rmqueue+218}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffff80133855>{__wake_up_common+67}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80131bc7>{recalc_task_prio+337}
<ffffffff80110ddb>{child_rip+0}
ll_ost_io_91 D<ffffffffa05160a0>{:lnet:lnet_send+2544}
0000000000000002 0 17396 1 17397 17395 (L-TLB)
0000010113fb3538 0000000000000046 0000010193fb3616 ffffffff801ece0f
0000000000000000 ffffff0010776638 000001010e14c6c0 0000000000000001
0000010113f90800 00000000000d4d86
Call Trace:<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffff801ece0f>{vsnprintf+1406}
<ffffffff8030e4ce>{schedule_timeout+411}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffffa0516ad7>{:lnet:lnet_match_blocked_msg+807}
<ffffffffa0516b48>{:lnet:lnet_match_blocked_msg+920}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff80133855>{__wake_up_common+67}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
ll_ost_io_107 D 0000000000000002 0 17412 1 17413 17411
(L-TLB)
000001010ea6f538 0000000000000046 000001018ea6f616 ffffffff801ece0f
0000000000000000 ffffff0010776768 000001010e14c6c0 0000000100000001
000001010ea48800 000000000012d2f5
Call Trace:<ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff801ece0f>{vsnprintf+1406}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff8030e4ce>{schedule_timeout+411}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffff8015c830>{__rmqueue+218}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff80131bc7>{recalc_task_prio+337}
<ffffffffa05160a0>{:lnet:lnet_send+2544}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
<ffffffff8030e4ce>{schedule_timeout+411}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffffa0516b48>{:lnet:lnet_match_blocked_msg+920}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
ll_ost_io_121 D 0000000000000002 0 17426 1 17427 17425
(L-TLB)
0000010110113538 0000000000000046 0000010190113616 ffffffff801ece0f
0000000000000000 ffffff00107763d8 000001010e14c6c0 0000000100000001
00000101100e6800 0000000000113501
Call Trace:<ffffffff801ece0f>{vsnprintf+1406} <ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffff8030e4ce>{schedule_timeout+411}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffff8015c830>{__rmqueue+218}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
Lustre: 0:0:(watchdog.c:130:lcw_cb()) Skipped 4 previous similar messages
Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack
for process 17398
Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) Skipped 4
previous similar messages
ll_ost_io_93 D 0000000000000002 0 17398 1 17399 17397
(L-TLB)
00000101157cd538 0000000000000046 00000101957cd616 ffffffff00000073
0000000000000000 0000000010776508 0000010001053a20 0000000200000001
00000101157a7800 0000000000161205
Call Trace:<ffffffff8030ec6c>{.text.lock.spinlock+2}
<ffffffff8030cc1f>{__down+147}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa067b484>{:ko2iblnd:kiblnd_init_tx_msg+308}
<ffffffff8030e73d>{__down_failed+53}
<ffffffffa06c6670>{:lquota:filter_quota_check+0}
<ffffffffa0843acf>{:obdfilter:.text.lock.filter_io_26+35}
<ffffffffa04f5539>{:lvfs:pop_ctxt+505}
<ffffffff80131bc7>{recalc_task_prio+337}
<ffffffffa05160a0>{:lnet:lnet_send+2544}
<ffffffffa083b01e>{:obdfilter:filter_commitrw+126}
<ffffffffa07f8801>{:ost:ost_checksum_bulk+577}
<ffffffffa07f8688>{:ost:ost_checksum_bulk+200}
<ffffffffa07fddd1>{:ost:ost_brw_write+9505}
<ffffffff8017a62e>{end_buffer_async_read+0}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa06015ef>{:ptlrpc:lustre_msg_get_version+95}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa06016e5>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa07f8060>{:ost:ost_bulk_timeout+0}
<ffffffffa080364d>{:ost:ost_handle+11661}
<ffffffff8017a62e>{end_buffer_async_read+0}
<ffffffff8017a62e>{end_buffer_async_read+0}
<ffffffff8017a62e>{end_buffer_async_read+0}
<ffffffff8017a62e>{end_buffer_async_read+0}
<ffffffff8015c830>{__rmqueue+218} <ffffffff8015c830>{__rmqueue+218}
<ffffffffa060a451>{:ptlrpc:ptlrpc_check_req+17}
<ffffffffa060c629>{:ptlrpc:ptlrpc_server_handle_request+2457}
<ffffffffa04df45e>{:libcfs:lcw_update_time+30}
<ffffffff8013f734>{__mod_timer+293}
<ffffffffa060ed05>{:ptlrpc:ptlrpc_main+3989}
<ffffffff8017a62e>{end_buffer_async_read+0}
<ffffffff80133804>{default_wake_function+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa060d270>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8}
<ffffffffa060dd70>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
LustreError: 17397:0:(filter_io_26.c:700:filter_commitrw_write())
testfs-OST0001: slow i_mutex 223s
LustreError: 17398:0:(filter_io_26.c:700:filter_commitrw_write())
testfs-OST0001: slow i_mutex 223s
LustreError: 17387:0:(filter_io_26.c:765:filter_commitrw_write())
testfs-OST0001: slow direct_io 223s
Lustre: 17397:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for
pid 17397 disabled after 223.2918s
Lustre: 17387:0:(watchdog.c:312:lcw_update_time()) Expired watchdog for
pid 17387 disabled after 223.3108s
Lustre: 17387:0:(watchdog.c:312:lcw_update_time()) Skipped 5 previous
similar messages
slab: cache size-1620 error: slabs_full accounting error
slab: cache size-1620 error: slabs_full accounting error
slab: cache size-1620 error: slabs_full accounting error
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<ffffffff801623c4>{s_show+62}
PML4 112a23067 PGD 114d4d067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)
ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U)
ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) sg(U) dell_rbu(U)
autofs4(U) i2c_nforce2(U) i2c_amd756(U) i2c_isa(U) i2c_amd8111(U)
i2c_i801(U) i2c_core(U) qlgc_vnic(U) iw_cxgb3(U) cxgb3(U) mlx4_ib(U)
mlx4_core(U) ib_mthca(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U)
rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) md5(U)
ipv6(U) cpufreq_powersave(U) mptctl(U) dm_mirror(U) dm_round_robin(U)
dm_multipath(U) dm_mod(U) sr_mod(U) usb_storage(U) joydev(U) button(U)
battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) hw_random(U) ib_ipath(U)
ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U)
ata_piix(U) libata(U) ext3(U) jbd(U) tg3(U) s2io(U) qla2400(U)
qla2xxx(U) scsi_transport_fc(U) nfs(U) nfs_acl(U) lockd(U) sunrpc(U)
mptsas(U) mptscsi(U) mptbase(U) megaraid_sas(U) e1000(U) bnx2(U)
sd_mod(U) scsi_mod(U)
Pid: 15569, comm: collectl Not tainted 2.6.9-67.0.7.EL_lustre.1.6.5.1smp
RIP: 0010:[<ffffffff801623c4>] <ffffffff801623c4>{s_show+62}
RSP: 0018:0000010115823e68 EFLAGS: 00010012
RAX: ffffffff80329f7a RBX: 00000100cffa5580 RCX: 00000100cffa5501
RDX: 0000000000000004 RSI: 0000000000000000 RDI: 00000100cffa56e8
RBP: ffffffff80329f7a R08: 00000000fffffffd R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000001000 R14: 000001012b617b80 R15: 0000000000000020
FS: 0000002a9630ee80(0000) GS:ffffffff8048e780(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000012bd38000 CR4: 00000000000006e0
Process collectl (pid: 15569, threadinfo 0000010115822000, task
00000101280a4800)
Stack: 0000000000000000 0000000000000000 0000000000000008 00000100cffa5580
000001012b617b80 0000000000000000 0000000000001000 0000000000000ea2
0000000000000000 ffffffff80196c1a
Call Trace:<ffffffff80196c1a>{seq_read+445} <ffffffff80178c28>{vfs_read+207}
<ffffffff80178e84>{sys_read+69} <ffffffff8011022a>{system_call+126}
Code: 48 8b 06 0f 18 08 48 8d 83 18 01 00 00 48 39 c6 74 2e 8b 93
RIP <ffffffff801623c4>{s_show+62} RSP <0000010115823e68>
CR2: 0000000000000000
<0>Kernel panic - not syncing: Oops
NMI Watchdog detected LOCKUP, CPU=2, registers:
CPU 2
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)
ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U)
ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) sg(U) dell_rbu(U)
autofs4(U) i2c_nforce2(U) i2c_amd756(U) i2c_isa(U) i2c_amd8111(U)
i2c_i801(U) i2c_core(U) qlgc_vnic(U) iw_cxgb3(U) cxgb3(U) mlx4_ib(U)
mlx4_core(U) ib_mthca(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U)
rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) md5(U)
ipv6(U) cpufreq_powersave(U) mptctl(U) dm_mirror(U) dm_round_robin(U)
dm_multipath(U) dm_mod(U) sr_mod(U) usb_storage(U) joydev(U) button(U)
battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) hw_random(U) ib_ipath(U)
ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U)
ata_piix(U) libata(U) ext3(U) jbd(U) tg3(U) s2io(U) qla2400(U)
qla2xxx(U) scsi_transport_fc(U) nfs(U) nfs_acl(U) lockd(U) sunrpc(U)
mptsas(U) mptscsi(U) mptbase(U) megaraid_sas(U) e1000(U) bnx2(U)
sd_mod(U) scsi_mod(U)
Pid: 12646, comm: klogd Not tainted 2.6.9-67.0.7.EL_lustre.1.6.5.1smp
RIP: 0010:[<ffffffff8030ec6c>] <ffffffff8030ec6c>{.text.lock.spinlock+2}
RSP: 0018:000001012a0a7b88 EFLAGS: 00000086
RAX: 0000000000000010 RBX: 00000100cffa56e8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000000004d0 RDI: 00000100cffa56e8
RBP: 000001012bc1e0c0 R08: 000001012a0a7cf0 R09: 6d5f697363732029
R10: 0000000000000053 R11: 0000000000000246 R12: 00000100cffa5688
R13: 00000100cffa5580 R14: 00000000000004d0 R15: 00000000000003e6
FS: 0000002a958a5b00(0000) GS:ffffffff8048e800(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000007fbfffa868 CR3: 000000012bd6e000 CR4: 00000000000006e0
Process klogd (pid: 12646, threadinfo 000001012a0a6000, task
00000101299e2030)
Stack: 000000000000000c ffffffff80160901 00000100cffa5580 00000100cffa5580
00000000000004d0 0000000000000400 0000000000000400 0000000000000000
00000000000003e6 ffffffff801607d0
Call Trace:<ffffffff80160901>{cache_alloc_refill+96}
<ffffffff801607d0>{__kmalloc+123}
<ffffffff802adc58>{alloc_skb+65}
<ffffffff802ac770>{sock_alloc_send_pskb+135}
<ffffffff80133855>{__wake_up_common+67}
<ffffffff801338ab>{__wake_up+54}
<ffffffff8030899a>{unix_dgram_sendmsg+364}
<ffffffff802aa430>{sock_aio_write+306}
<ffffffff80178d0f>{do_sync_write+178}
<ffffffff80137822>{do_syslog+482}
<ffffffff801351dc>{autoremove_wake_function+0}
<ffffffff801351dc>{autoremove_wake_function+0}
<ffffffff801351dc>{autoremove_wake_function+0}
<ffffffff80193ed0>{dnotify_parent+34}
<ffffffff80178e1d>{vfs_write+226} <ffffffff80178ef2>{sys_write+69}
<ffffffff8011022a>{system_call+126}
Code: 83 3b 00 7e f9 e9 60 fc ff ff f3 90 83 3b 00 7e f9 e9 ce fc
Kernel panic - not syncing: nmi watchdog
<1>Unable to handle kernel NULL pointer dereference at 00000000000000ff
RIP:
[<00000000000000ff>]
PML4 11b4a6067 PGD 0
Oops: 0010 [2] SMP
CPU 2
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)
ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U)
ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) sg(U) dell_rbu(U)
autofs4(U) i2c_nforce2(U) i2c_amd756(U) i2c_isa(U) i2c_amd8111(U)
i2c_i801(U) i2c_core(U) qlgc_vnic(U) iw_cxgb3(U) cxgb3(U) mlx4_ib(U)
mlx4_core(U) ib_mthca(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U)
rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) md5(U)
ipv6(U) cpufreq_powersave(U) mptctl(U) dm_mirror(U) dm_round_robin(U)
dm_multipath(U) dm_mod(U) sr_mod(U) usb_storage(U) joydev(U) button(U)
battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) hw_random(U) ib_ipath(U)
ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U)
ata_piix(U) libata(U) ext3(U) jbd(U) tg3(U) s2io(U) qla2400(U)
qla2xxx(U) scsi_transport_fc(U) nfs(U) nfs_acl(U) lockd(U) sunrpc(U)
mptsas(U) mptscsi(U) mptbase(U) megaraid_sas(U) e1000(U) bnx2(U)
sd_mod(U) scsi_mod(U)
Pid: 12646, comm: klogd Not tainted 2.6.9-67.0.7.EL_lustre.1.6.5.1smp
RIP: 0010:[<00000000000000ff>] [<00000000000000ff>]
RSP: 0018:00000100cfb03fa0 EFLAGS: 00010006
RAX: 000001012a0a7fd8 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 00000000000000ff RSI: 0000000000000000 RDI: 0000000000000002
RBP: 000001012bd71f58 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000100cffa5688
R13: 00000100cffa5580 R14: 00000000000004d0 R15: 00000000000003e6
FS: 0000002a958a5b00(0000) GS:ffffffff8048e800(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000ff CR3: 000000012bd6e000 CR4: 00000000000006e0
Process klogd (pid: 12646, threadinfo 000001012a0a6000, task
00000101299e2030)
Stack: ffffffff8011c56b ffffffff80321aac ffffffff80110a73
000001012bd71cf8 <EOI>
000000000000000d ffffffff80321aac 000000000000000d ffffffff80324122
0000000000000001 0000000000000000
Call Trace:<IRQ> <ffffffff8011c56b>{smp_call_function_interrupt+64}
<ffffffff80110a73>{call_function_interrupt+99} <EOI>
<ffffffff8011c51e>{smp_send_stop+76}
<ffffffff8013744a>{panic+235} <ffffffff801116fc>{show_stack+241}
<ffffffff80111826>{show_registers+277}
<ffffffff80111b2d>{die_nmi+130}
<ffffffff8011d042>{nmi_watchdog_tick+210}
<ffffffff801123fa>{default_do_nmi+112}
<ffffffff8011d0f8>{do_nmi+115} <ffffffff8011100f>{paranoid_exit+0}
<ffffffff8030ec6c>{.text.lock.spinlock+2}
Code: Bad RIP value.
RIP [<00000000000000ff>] RSP <00000100cfb03fa0>
CR2: 00000000000000ff
<0>Kernel panic - not syncing: Oops
Badness in panic at kernel/panic.c:118
Call Trace:<IRQ> <ffffffff8013756e>{panic+527}
<ffffffff801130a5>{do_IRQ+266}
<ffffffff801107d1>{ret_from_intr+0} <ffffffff80111988>{oops_end+38}
<ffffffff801119a3>{oops_end+65}
<ffffffff80123e29>{do_page_fault+1125}
<ffffffff801ea99e>{kobject_release+0}
<ffffffff80131c55>{activate_task+124}
<ffffffff80132180>{try_to_wake_up+876}
<ffffffff80110c2d>{error_exit+0}
<ffffffff8011c56b>{smp_call_function_interrupt+64}
<ffffffff80110a73>{call_function_interrupt+99} <EOI>
<ffffffff8011c51e>{smp_send_stop+76}
<ffffffff8013744a>{panic+235} <ffffffff801116fc>{show_stack+241}
<ffffffff80111826>{show_registers+277}
<ffffffff80111b2d>{die_nmi+130}
<ffffffff8011d042>{nmi_watchdog_tick+210}
<ffffffff801440f3>{notifier_call_chain+31}
<ffffffff801123fa>{default_do_nmi+112} <ffffffff8011d0f8>{do_nmi+115}
<ffffffff8011100f>{paranoid_exit+0} <ffffffff80111988>{oops_end+38}
Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987
Call Trace:<IRQ> <ffffffff8024478f>{i8042_panic_blink+238}
<ffffffff8013751c>{panic+445}
<ffffffff801130a5>{do_IRQ+266} <ffffffff801107d1>{ret_from_intr+0}
<ffffffff80111988>{oops_end+38} <ffffffff801119a3>{oops_end+65}
<ffffffff80123e29>{do_page_fault+1125}
<ffffffff801ea99e>{kobject_release+0}
<ffffffff80131c55>{activate_task+124}
<ffffffff80132180>{try_to_wake_up+876}
<ffffffff80110c2d>{error_exit+0}
<ffffffff8011c56b>{smp_call_function_interrupt+64}
<ffffffff80110a73>{call_function_interrupt+99} <EOI>
<ffffffff8011c51e>{smp_send_stop+76}
<ffffffff8013744a>{panic+235} <ffffffff801116fc>{show_stack+241}
<ffffffff80111826>{show_registers+277}
<ffffffff80111b2d>{die_nmi+130}
<ffffffff8011d042>{nmi_watchdog_tick+210}
<ffffffff801440f3>{notifier_call_chain+31}
<ffffffff801123fa>{default_do_nmi+112} <ffffffff8011d0f8>{do_nmi+115}
<ffffffff8011100f>{paranoid_exit+0} <ffffffff80111988>{oops_end+38}
Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990
Call Trace:<IRQ> <ffffffff80244821>{i8042_panic_blink+384}
<ffffffff8013751c>{panic+445}
<ffffffff801130a5>{do_IRQ+266} <ffffffff801107d1>{ret_from_intr+0}
<ffffffff80111988>{oops_end+38} <ffffffff801119a3>{oops_end+65}
<ffffffff80123e29>{do_page_fault+1125}
<ffffffff801ea99e>{kobject_release+0}
<ffffffff80131c55>{activate_task+124}
<ffffffff80132180>{try_to_wake_up+876}
<ffffffff80110c2d>{error_exit+0}
<ffffffff8011c56b>{smp_call_function_interrupt+64}
<ffffffff80110a73>{call_function_interrupt+99} <EOI>
<ffffffff8011c51e>{smp_send_stop+76}
<ffffffff8013744a>{panic+235} <ffffffff801116fc>{show_stack+241}
<ffffffff80111826>{show_registers+277}
<ffffffff80111b2d>{die_nmi+130}
<ffffffff8011d042>{nmi_watchdog_tick+210}
<ffffffff801440f3>{notifier_call_chain+31}
<ffffffff801123fa>{default_do_nmi+112} <ffffffff8011d0f8>{do_nmi+115}
<ffffffff8011100f>{paranoid_exit+0} <ffffffff80111988>{oops_end+38}
Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992
Call Trace:<IRQ> <ffffffff80244886>{i8042_panic_blink+485}
<ffffffff8013751c>{panic+445}
<ffffffff801130a5>{do_IRQ+266} <ffffffff801107d1>{ret_from_intr+0}
<ffffffff80111988>{oops_end+38} <ffffffff801119a3>{oops_end+65}
<ffffffff80123e29>{do_page_fault+1125}
<ffffffff801ea99e>{kobject_release+0}
<ffffffff80131c55>{activate_task+124}
<ffffffff80132180>{try_to_wake_up+876}
<ffffffff80110c2d>{error_exit+0}
<ffffffff8011c56b>{smp_call_function_interrupt+64}
<ffffffff80110a73>{call_function_interrupt+99} <EOI>
<ffffffff8011c51e>{smp_send_stop+76}
<ffffffff8013744a>{panic+235} <ffffffff801116fc>{show_stack+241}
<ffffffff80111826>{show_registers+277}
<ffffffff80111b2d>{die_nmi+130}
<ffffffff8011d042>{nmi_watchdog_tick+210}
<ffffffff801440f3>{notifier_call_chain+31}
<ffffffff801123fa>{default_do_nmi+112} <ffffffff8011d0f8>{do_nmi+115}
<ffffffff8011100f>{paranoid_exit+0} <ffffffff80111988>{oops_end+38}
Thank you in advance for helping me with this.
Regards,
Wojciech
--
Wojciech Turek
Assistant System Manager
High Performance Computing Service
University of Cambridge
More information about the lustre-discuss
mailing list