[Lustre-discuss] MDS crash during mount, last_rcvd trick not working
Jakob Goldbach
jakob at goldbach.dk
Tue Jan 6 14:09:27 PST 2009
Hi,
My MDS crashed during MDT mount. The last_rcvd trick described in the
knowledge base is not working -kernel still crashes after truncating
last_rcvd to 8k. (I have used it successfully before).
Any ideas (other than upgrading from 1.6.4.3) on getting my MDT running
again ?
Thanks
/Jakob
[ 344.935438] BUG: scheduling while atomic:
mount.lustre/0xffff8101/2024
[ 344.936754]
[ 344.936755] Call Trace:
[ 344.937738] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[ 344.939092] ----------- [cut here ] --------- [please bite here ]
---------
[ 344.940751] Kernel BUG at kernel/sched.c:1008
[ 344.941801] invalid opcode: 0000 [1] SMP
[ 344.942784] CPU 0
[ 344.943308] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre
lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs ldiskfs crc16
ipmi_devintf ipmi_si ipmi_msghandler bonding dm_snapshot dm_mirror
dm_mod generic serio_raw piix ehci_hcd uhci_hcd ide_core
[ 344.949927] Pid: 2024, comm: mount.lustre Not tainted
2.6.18.8-bnx2-1.6.7b-cciss-3.6.18-5-lustre-1.6.4.3 #2
[ 344.951972] RIP: 0010:[<ffffffff80274371>] [<ffffffff80274371>]
resched_task+0x24/0x65
[ 344.953893] RSP: 0018:ffffffff804ccdc0 EFLAGS: 00010002
[ 344.955099] RAX: 0000000000000001 RBX: 000000504ff8c8da RCX:
ffff810124422000
[ 344.956687] RDX: ffff81012bd3bbc0 RSI: ffff810001023bf8 RDI:
ffff81012b06a180
[ 344.958253] RBP: ffffffff804ccdc0 R08: 000000000000000d R09:
000000000000007f
[ 344.959865] R10: ffff81012baec420 R11: 0000000000000000 R12:
ffff81012b8dd810
[ 344.961259] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff8100010232a0
[ 344.962946] FS: 00002ac6e3d176d0(0000) GS:ffffffff8051a000(0000)
knlGS:0000000000000000
[ 344.964530] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 344.965871] CR2: 00002b233f140160 CR3: 0000000124240000 CR4:
00000000000006e0
[ 344.967261] Process mount.lustre (pid: 2024, threadinfo
ffff810124422000, task ffff81012b06a180)
[ 344.968992] Stack: ffffffff804cce20 ffffffff8024232e
0000000000000000 0000000000000001
[ 344.970865] 0000000000000001 0000000000000002 0000000000000082
ffff81012b8dd810
[ 344.972743] 000000000000000e 0000000000000001 ffff810001024d04
0000000000000000
[ 344.974502] Call Trace:
[ 344.975203] <IRQ> [<ffffffff8024232e>] try_to_wake_up+0x2e3/0x353
[ 344.976561] [<ffffffff8027fab7>] signal_wake_up+0x1e/0x2d
[ 344.977835] [<ffffffff8027fdcc>] __group_send_sig_info+0x89/0x94
[ 344.979030] [<ffffffff802551cf>] group_send_sig_info+0x4e/0x75
[ 344.980414] [<ffffffff80280cf3>] send_group_sig_info+0x28/0x35
[ 344.981591] [<ffffffff8027a99d>] it_real_fn+0x23/0x4f
[ 344.982775] [<ffffffff8027a97a>] it_real_fn+0x0/0x4f
[ 344.983792] [<ffffffff80249dbb>] hrtimer_run_queues+0x107/0x16d
[ 344.984974] [<ffffffff8027e434>] run_timer_softirq+0x21/0x1b0
[ 344.986369] [<ffffffff802101e5>] __do_softirq+0x5e/0xd6
[ 344.987602] [<ffffffff80305e65>] end_msi_irq_w_maskbit+0xf/0x1c
[ 344.994691] [<ffffffff80257f58>] call_softirq+0x1c/0x28
[ 344.996209] [<ffffffff802610a6>] do_softirq+0x2c/0x7d
[ 344.997383] [<ffffffff80261071>] do_IRQ+0x6a/0x73
[ 344.998472] [<ffffffff8025727d>] ret_from_intr+0x0/0xa
[ 344.999537] <EOI> [<ffffffff8027918c>] vprintk+0x29e/0x2ea
[ 345.000844] [<ffffffff80286a6c>] autoremove_wake_function+0x9/0x2e
[ 345.002332] [<ffffffff80273dbf>] __wake_up_common+0x3e/0x68
[ 345.003612] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[ 345.004946] [<ffffffff80279226>] printk+0x4e/0x56
[ 345.006061] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[ 345.007403] [<ffffffff8027918c>] vprintk+0x29e/0x2ea
[ 345.008607] [<ffffffff8028e1bc>] kallsyms_lookup+0xe7/0x1af
[ 345.009948] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[ 345.011277] [<ffffffff8025f832>] printk_address+0x9f/0xac
[ 345.012519] [<ffffffff80279226>] printk+0x4e/0x56
[ 345.013507] [<ffffffff802f1216>] elv_insert+0xc9/0x192
[ 345.014549] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[ 345.015890] [<ffffffff8025fa38>] show_trace+0x1f9/0x21f
[ 345.016965] [<ffffffff802130a8>] sync_buffer+0x0/0x3f
[ 345.018125] [<ffffffff8025fa70>] dump_stack+0x12/0x17
[ 345.019361] [<ffffffff8804a2bf>] :dm_mod:__map_bio+0x47/0x9b
[ 345.020664] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[ 345.021999] [<ffffffff8023ab95>] lock_timer_base+0x1b/0x3c
[ 345.023258] [<ffffffff8022f226>] del_timer+0x4e/0x57
[ 345.024442] [<ffffffff802130a8>] sync_buffer+0x0/0x3f
[ 345.025663] [<ffffffff8025a59a>] io_schedule+0x28/0x34
[ 345.026919] [<ffffffff802130e3>] sync_buffer+0x3b/0x3f
[ 345.028132] [<ffffffff8025a8f5>] __wait_on_bit+0x40/0x6f
[ 345.029198] [<ffffffff802130a8>] sync_buffer+0x0/0x3f
[ 345.030400] [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78
[ 345.031660] [<ffffffff80286a91>] wake_bit_function+0x0/0x23
[ 345.032977] [<ffffffff80222c9f>] __bread+0x62/0x77
[ 345.034066] [<ffffffff880a1de2>] :ldiskfs:read_block_bitmap
+0xa2/0xf0
[ 345.035359] [<ffffffff880a2695>] :ldiskfs:ldiskfs_free_blocks_sb
+0x115/0x510
[ 345.036986] [<ffffffff880a2b21>] :ldiskfs:ldiskfs_free_blocks
+0x91/0xe0
[ 345.038504] [<ffffffff880a7d1a>] :ldiskfs:ldiskfs_free_data
+0x8a/0x110
[ 345.039828] [<ffffffff880a819c>] :ldiskfs:ldiskfs_truncate
+0x20c/0x650
[ 345.041133] [<ffffffff802dbeab>] start_this_handle+0x355/0x405
[ 345.042556] [<ffffffff880a8bb4>] :ldiskfs:ldiskfs_delete_inode
+0x84/0xf0
[ 345.044197] [<ffffffff880a8b30>] :ldiskfs:ldiskfs_delete_inode
+0x0/0xf0
[ 345.045501] [<ffffffff8022c804>] generic_delete_inode+0x8e/0x10b
[ 345.046728] [<ffffffff883ed891>] :mds:mds_obd_destroy+0xa11/0xad0
[ 345.048128] [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b
[ 345.049525] [<ffffffff8814961b>] :obdclass:llog_lvfs_close
+0x6b/0x130
[ 345.051039] [<ffffffff8814a6c1>] :obdclass:llog_lvfs_destroy
+0x841/0xa10
[ 345.052386] [<ffffffff88146a0f>] :obdclass:llog_cat_id2handle
+0x4cf/0x5f0
[ 345.053994] [<ffffffff8021557d>] cache_grow+0x2ee/0x343
[ 345.055074] [<ffffffff881509c5>] :obdclass:cat_cancel_cb+0x405/0x630
[ 345.056634] [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20
[ 345.058192] [<ffffffff8020c894>] dput+0x23/0x152
[ 345.059280] [<ffffffff881505c0>] :obdclass:cat_cancel_cb+0x0/0x630
[ 345.060717] [<ffffffff881503b3>] :obdclass:llog_obd_origin_setup
+0x773/0x980
[ 345.062330] [<ffffffff8027486e>] find_busiest_group+0x20d/0x634
[ 345.063694] [<ffffffff8021819f>] vsnprintf+0x55e/0x5a3
[ 345.064967] [<ffffffff8815137d>] :obdclass:llog_setup+0x78d/0x860
[ 345.066364] [<ffffffff8842da94>] :osc:osc_llog_init+0x104/0x390
[ 345.067748] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[ 345.069099] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210
[ 345.070579] [<ffffffff882b92ca>] :lov:lov_llog_init+0x2ca/0x400
[ 345.071958] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210
[ 345.073485] [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b
[ 345.074837] [<ffffffff883b31ad>] :mds:mds_llog_init+0x1ad/0x270
[ 345.076015] [<ffffffff8029abcb>] map_vm_area+0x229/0x2a8
[ 345.077175] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210
[ 345.078448] [<ffffffff8029af5b>] __vmalloc_area_node+0x12b/0x153
[ 345.079650] [<ffffffff8814edc5>] :obdclass:llog_cat_initialize
+0x3b5/0x670
[ 345.081268] [<ffffffff882cdc61>] :lov:lov_get_info+0x9f1/0xaa0
[ 345.082616] [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78
[ 345.083841] [<ffffffff80286a91>] wake_bit_function+0x0/0x23
[ 345.085059] [<ffffffff883bc5ac>] :mds:mds_lov_update_desc
+0xbcc/0xd30
[ 345.086619] [<ffffffff883c0e21>] :mds:mds_lov_connect+0x12c1/0x2020
[ 345.088059] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[ 345.089271] [<ffffffff8815135e>] :obdclass:llog_setup+0x76e/0x860
[ 345.090497] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[ 345.091872] [<ffffffff880f9db8>] :lvfs:upcall_cache_init+0x2f8/0x3a0
[ 345.093153] [<ffffffff883ce381>] :mds:mds_setup+0x10a1/0x1bd0
[ 345.094315] [<ffffffff8021557d>] cache_grow+0x2ee/0x343
[ 345.095371] [<ffffffff802562d7>] cache_alloc_refill+0xde/0x1da
[ 345.096740] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[ 345.098050] [<ffffffff8815a5cd>] :obdclass:class_new_export
+0x52d/0x5b0
[ 345.099458] [<ffffffff8816fcdb>] :obdclass:class_setup+0x8bb/0xbe0
[ 345.100697] [<ffffffff8817236a>] :obdclass:class_process_config
+0x14ca/0x19f0
[ 345.102340] [<ffffffff881756da>] :obdclass:class_config_llog_handler
+0x153a/0x1990
[ 345.104079] [<ffffffff80224869>] do_filp_open+0x2d/0x3d
[ 345.105317] [<ffffffff8814bcfc>] :obdclass:llog_lvfs_next_block
+0x2ac/0x710
[ 345.106876] [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20
[ 345.108321] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[ 345.109465] [<ffffffff881741a0>] :obdclass:class_config_llog_handler
+0x0/0x1990
[ 345.111169] [<ffffffff8817402f>] :obdclass:class_config_parse_llog
+0x43f/0x5b0
[ 345.112828] [<ffffffff8020c8a5>] dput+0x34/0x152
[ 345.113868] [<ffffffff880f9052>] :lvfs:lustre_rename+0x482/0x530
[ 345.115157] [<ffffffff88143fea>] :obdclass:llog_close+0x1aa/0x230
[ 345.116668] [<ffffffff8836fe03>] :mgc:mgc_process_log+0x20f3/0x2640
[ 345.117916] [<ffffffff88370b90>] :mgc:mgc_blocking_ast+0x0/0x450
[ 345.119221] [<ffffffff881ddeb0>] :ptlrpc:ldlm_completion_ast
+0x0/0x6a0
[ 345.120556] [<ffffffff8836d85c>] :mgc:config_log_find+0x19c/0x340
[ 345.121954] [<ffffffff88373fc2>] :mgc:mgc_process_config
+0xe02/0x1280
[ 345.123472] [<ffffffff881795bc>] :obdclass:lustre_process_log
+0xb2c/0xee0
[ 345.125033] [<ffffffff88179a40>] :obdclass:server_find_mount
+0x80/0x190
[ 345.126421] [<ffffffff8817f7a6>] :obdclass:server_start_targets
+0xb36/0x17e0
[ 345.127819] [<ffffffff8022d4ac>] __up_write+0x21/0x10d
[ 345.128871] [<ffffffff88183c27>] :obdclass:server_fill_super
+0x18c7/0x1ee0
[ 345.130308] [<ffffffff80208d6d>] __d_lookup+0xb0/0x100
[ 345.131812] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[ 345.132994] [<ffffffff881778bf>] :obdclass:lustre_init_lsi
+0x29f/0x660
[ 345.134301] [<ffffffff88184240>] :obdclass:lustre_fill_super
+0x0/0x1ae0
[ 345.135680] [<ffffffff88185ba3>] :obdclass:lustre_fill_super
+0x1963/0x1ae0
[ 345.137254] [<ffffffff802a95f5>] set_anon_super+0x3c/0xab
[ 345.138372] [<ffffffff802a95b9>] set_anon_super+0x0/0xab
[ 345.139609] [<ffffffff88184240>] :obdclass:lustre_fill_super
+0x0/0x1ae0
[ 345.141115] [<ffffffff802a9805>] get_sb_nodev+0x4f/0x97
[ 345.142318] [<ffffffff802a910b>] vfs_kern_mount+0x93/0x11a
[ 345.143573] [<ffffffff802a91d4>] do_kern_mount+0x36/0x4d
[ 345.144754] [<ffffffff802b1982>] do_mount+0x68c/0x6ff
[ 345.145930] [<ffffffff802088d3>] __handle_mm_fault+0x530/0x91a
[ 345.147288] [<ffffffff80218776>] remove_vma+0x55/0x5c
[ 345.148307] [<ffffffff8021f84a>] __up_read+0x13/0x8a
[ 345.149455] [<ffffffff8020a6af>] do_page_fault+0x3d1/0x706
[ 345.150715] [<ffffffff8020c2e4>] do_path_lookup+0x268/0x28c
[ 345.151992] [<ffffffff80297807>] zone_statistics+0x3e/0x6d
[ 345.153145] [<ffffffff8020dcbc>] __alloc_pages+0x5c/0x29b
[ 345.154399] [<ffffffff802472dd>] sys_mount+0x8a/0xd7
[ 345.155550] [<ffffffff80256d82>] system_call+0x7e/0x83
[ 345.156591]
[ 345.156966]
[ 345.156967] Code: 0f 0b 68 aa 6c 3f 80 c2 f0 03 8b 41 10 a8 08 75 2e
f0 0f ba
[ 345.159822] RIP [<ffffffff80274371>] resched_task+0x24/0x65
[ 345.161214] RSP <ffffffff804ccdc0>
[ 345.161948] <0>Kernel panic - not syncing: Aiee, killing interrupt
handler!
[ 345.163565]
More information about the lustre-discuss
mailing list