[Lustre-discuss] MDS crash during mount, last_rcvd trick not working

Jakob Goldbach jakob at goldbach.dk
Tue Jan 6 14:09:27 PST 2009


Hi,

My MDS crashed during MDT mount. The last_rcvd trick described in the
knowledge base is not working -kernel still crashes after truncating
last_rcvd to 8k. (I have used it successfully before).

Any ideas (other than upgrading from 1.6.4.3) on getting my MDT running
again ?

Thanks
/Jakob

[  344.935438] BUG: scheduling while atomic:
mount.lustre/0xffff8101/2024
[  344.936754] 
[  344.936755] Call Trace:
[  344.937738]  [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[  344.939092] ----------- [cut here ] --------- [please bite here ]
---------
[  344.940751] Kernel BUG at kernel/sched.c:1008
[  344.941801] invalid opcode: 0000 [1] SMP 
[  344.942784] CPU 0 
[  344.943308] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre
lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs ldiskfs crc16
ipmi_devintf ipmi_si ipmi_msghandler bonding dm_snapshot dm_mirror
dm_mod generic serio_raw piix ehci_hcd uhci_hcd ide_core
[  344.949927] Pid: 2024, comm: mount.lustre Not tainted
2.6.18.8-bnx2-1.6.7b-cciss-3.6.18-5-lustre-1.6.4.3 #2
[  344.951972] RIP: 0010:[<ffffffff80274371>]  [<ffffffff80274371>]
resched_task+0x24/0x65
[  344.953893] RSP: 0018:ffffffff804ccdc0  EFLAGS: 00010002
[  344.955099] RAX: 0000000000000001 RBX: 000000504ff8c8da RCX:
ffff810124422000
[  344.956687] RDX: ffff81012bd3bbc0 RSI: ffff810001023bf8 RDI:
ffff81012b06a180
[  344.958253] RBP: ffffffff804ccdc0 R08: 000000000000000d R09:
000000000000007f
[  344.959865] R10: ffff81012baec420 R11: 0000000000000000 R12:
ffff81012b8dd810
[  344.961259] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff8100010232a0
[  344.962946] FS:  00002ac6e3d176d0(0000) GS:ffffffff8051a000(0000)
knlGS:0000000000000000
[  344.964530] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  344.965871] CR2: 00002b233f140160 CR3: 0000000124240000 CR4:
00000000000006e0
[  344.967261] Process mount.lustre (pid: 2024, threadinfo
ffff810124422000, task ffff81012b06a180)
[  344.968992] Stack:  ffffffff804cce20 ffffffff8024232e
0000000000000000 0000000000000001
[  344.970865]  0000000000000001 0000000000000002 0000000000000082
ffff81012b8dd810
[  344.972743]  000000000000000e 0000000000000001 ffff810001024d04
0000000000000000
[  344.974502] Call Trace:
[  344.975203]  <IRQ> [<ffffffff8024232e>] try_to_wake_up+0x2e3/0x353
[  344.976561]  [<ffffffff8027fab7>] signal_wake_up+0x1e/0x2d
[  344.977835]  [<ffffffff8027fdcc>] __group_send_sig_info+0x89/0x94
[  344.979030]  [<ffffffff802551cf>] group_send_sig_info+0x4e/0x75
[  344.980414]  [<ffffffff80280cf3>] send_group_sig_info+0x28/0x35
[  344.981591]  [<ffffffff8027a99d>] it_real_fn+0x23/0x4f
[  344.982775]  [<ffffffff8027a97a>] it_real_fn+0x0/0x4f
[  344.983792]  [<ffffffff80249dbb>] hrtimer_run_queues+0x107/0x16d
[  344.984974]  [<ffffffff8027e434>] run_timer_softirq+0x21/0x1b0
[  344.986369]  [<ffffffff802101e5>] __do_softirq+0x5e/0xd6
[  344.987602]  [<ffffffff80305e65>] end_msi_irq_w_maskbit+0xf/0x1c
[  344.994691]  [<ffffffff80257f58>] call_softirq+0x1c/0x28
[  344.996209]  [<ffffffff802610a6>] do_softirq+0x2c/0x7d
[  344.997383]  [<ffffffff80261071>] do_IRQ+0x6a/0x73
[  344.998472]  [<ffffffff8025727d>] ret_from_intr+0x0/0xa
[  344.999537]  <EOI> [<ffffffff8027918c>] vprintk+0x29e/0x2ea
[  345.000844]  [<ffffffff80286a6c>] autoremove_wake_function+0x9/0x2e
[  345.002332]  [<ffffffff80273dbf>] __wake_up_common+0x3e/0x68
[  345.003612]  [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[  345.004946]  [<ffffffff80279226>] printk+0x4e/0x56
[  345.006061]  [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[  345.007403]  [<ffffffff8027918c>] vprintk+0x29e/0x2ea
[  345.008607]  [<ffffffff8028e1bc>] kallsyms_lookup+0xe7/0x1af
[  345.009948]  [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[  345.011277]  [<ffffffff8025f832>] printk_address+0x9f/0xac
[  345.012519]  [<ffffffff80279226>] printk+0x4e/0x56
[  345.013507]  [<ffffffff802f1216>] elv_insert+0xc9/0x192
[  345.014549]  [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[  345.015890]  [<ffffffff8025fa38>] show_trace+0x1f9/0x21f
[  345.016965]  [<ffffffff802130a8>] sync_buffer+0x0/0x3f
[  345.018125]  [<ffffffff8025fa70>] dump_stack+0x12/0x17
[  345.019361]  [<ffffffff8804a2bf>] :dm_mod:__map_bio+0x47/0x9b
[  345.020664]  [<ffffffff8025973a>] __sched_text_start+0x7a/0x769
[  345.021999]  [<ffffffff8023ab95>] lock_timer_base+0x1b/0x3c
[  345.023258]  [<ffffffff8022f226>] del_timer+0x4e/0x57
[  345.024442]  [<ffffffff802130a8>] sync_buffer+0x0/0x3f
[  345.025663]  [<ffffffff8025a59a>] io_schedule+0x28/0x34
[  345.026919]  [<ffffffff802130e3>] sync_buffer+0x3b/0x3f
[  345.028132]  [<ffffffff8025a8f5>] __wait_on_bit+0x40/0x6f
[  345.029198]  [<ffffffff802130a8>] sync_buffer+0x0/0x3f
[  345.030400]  [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78
[  345.031660]  [<ffffffff80286a91>] wake_bit_function+0x0/0x23
[  345.032977]  [<ffffffff80222c9f>] __bread+0x62/0x77
[  345.034066]  [<ffffffff880a1de2>] :ldiskfs:read_block_bitmap
+0xa2/0xf0
[  345.035359]  [<ffffffff880a2695>] :ldiskfs:ldiskfs_free_blocks_sb
+0x115/0x510
[  345.036986]  [<ffffffff880a2b21>] :ldiskfs:ldiskfs_free_blocks
+0x91/0xe0
[  345.038504]  [<ffffffff880a7d1a>] :ldiskfs:ldiskfs_free_data
+0x8a/0x110
[  345.039828]  [<ffffffff880a819c>] :ldiskfs:ldiskfs_truncate
+0x20c/0x650
[  345.041133]  [<ffffffff802dbeab>] start_this_handle+0x355/0x405
[  345.042556]  [<ffffffff880a8bb4>] :ldiskfs:ldiskfs_delete_inode
+0x84/0xf0
[  345.044197]  [<ffffffff880a8b30>] :ldiskfs:ldiskfs_delete_inode
+0x0/0xf0
[  345.045501]  [<ffffffff8022c804>] generic_delete_inode+0x8e/0x10b
[  345.046728]  [<ffffffff883ed891>] :mds:mds_obd_destroy+0xa11/0xad0
[  345.048128]  [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b
[  345.049525]  [<ffffffff8814961b>] :obdclass:llog_lvfs_close
+0x6b/0x130
[  345.051039]  [<ffffffff8814a6c1>] :obdclass:llog_lvfs_destroy
+0x841/0xa10
[  345.052386]  [<ffffffff88146a0f>] :obdclass:llog_cat_id2handle
+0x4cf/0x5f0
[  345.053994]  [<ffffffff8021557d>] cache_grow+0x2ee/0x343
[  345.055074]  [<ffffffff881509c5>] :obdclass:cat_cancel_cb+0x405/0x630
[  345.056634]  [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20
[  345.058192]  [<ffffffff8020c894>] dput+0x23/0x152
[  345.059280]  [<ffffffff881505c0>] :obdclass:cat_cancel_cb+0x0/0x630
[  345.060717]  [<ffffffff881503b3>] :obdclass:llog_obd_origin_setup
+0x773/0x980
[  345.062330]  [<ffffffff8027486e>] find_busiest_group+0x20d/0x634
[  345.063694]  [<ffffffff8021819f>] vsnprintf+0x55e/0x5a3
[  345.064967]  [<ffffffff8815137d>] :obdclass:llog_setup+0x78d/0x860
[  345.066364]  [<ffffffff8842da94>] :osc:osc_llog_init+0x104/0x390
[  345.067748]  [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[  345.069099]  [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210
[  345.070579]  [<ffffffff882b92ca>] :lov:lov_llog_init+0x2ca/0x400
[  345.071958]  [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210
[  345.073485]  [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b
[  345.074837]  [<ffffffff883b31ad>] :mds:mds_llog_init+0x1ad/0x270
[  345.076015]  [<ffffffff8029abcb>] map_vm_area+0x229/0x2a8
[  345.077175]  [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210
[  345.078448]  [<ffffffff8029af5b>] __vmalloc_area_node+0x12b/0x153
[  345.079650]  [<ffffffff8814edc5>] :obdclass:llog_cat_initialize
+0x3b5/0x670
[  345.081268]  [<ffffffff882cdc61>] :lov:lov_get_info+0x9f1/0xaa0
[  345.082616]  [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78
[  345.083841]  [<ffffffff80286a91>] wake_bit_function+0x0/0x23
[  345.085059]  [<ffffffff883bc5ac>] :mds:mds_lov_update_desc
+0xbcc/0xd30
[  345.086619]  [<ffffffff883c0e21>] :mds:mds_lov_connect+0x12c1/0x2020
[  345.088059]  [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[  345.089271]  [<ffffffff8815135e>] :obdclass:llog_setup+0x76e/0x860
[  345.090497]  [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[  345.091872]  [<ffffffff880f9db8>] :lvfs:upcall_cache_init+0x2f8/0x3a0
[  345.093153]  [<ffffffff883ce381>] :mds:mds_setup+0x10a1/0x1bd0
[  345.094315]  [<ffffffff8021557d>] cache_grow+0x2ee/0x343
[  345.095371]  [<ffffffff802562d7>] cache_alloc_refill+0xde/0x1da
[  345.096740]  [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[  345.098050]  [<ffffffff8815a5cd>] :obdclass:class_new_export
+0x52d/0x5b0
[  345.099458]  [<ffffffff8816fcdb>] :obdclass:class_setup+0x8bb/0xbe0
[  345.100697]  [<ffffffff8817236a>] :obdclass:class_process_config
+0x14ca/0x19f0
[  345.102340]  [<ffffffff881756da>] :obdclass:class_config_llog_handler
+0x153a/0x1990
[  345.104079]  [<ffffffff80224869>] do_filp_open+0x2d/0x3d
[  345.105317]  [<ffffffff8814bcfc>] :obdclass:llog_lvfs_next_block
+0x2ac/0x710
[  345.106876]  [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20
[  345.108321]  [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[  345.109465]  [<ffffffff881741a0>] :obdclass:class_config_llog_handler
+0x0/0x1990
[  345.111169]  [<ffffffff8817402f>] :obdclass:class_config_parse_llog
+0x43f/0x5b0
[  345.112828]  [<ffffffff8020c8a5>] dput+0x34/0x152
[  345.113868]  [<ffffffff880f9052>] :lvfs:lustre_rename+0x482/0x530
[  345.115157]  [<ffffffff88143fea>] :obdclass:llog_close+0x1aa/0x230
[  345.116668]  [<ffffffff8836fe03>] :mgc:mgc_process_log+0x20f3/0x2640
[  345.117916]  [<ffffffff88370b90>] :mgc:mgc_blocking_ast+0x0/0x450
[  345.119221]  [<ffffffff881ddeb0>] :ptlrpc:ldlm_completion_ast
+0x0/0x6a0
[  345.120556]  [<ffffffff8836d85c>] :mgc:config_log_find+0x19c/0x340
[  345.121954]  [<ffffffff88373fc2>] :mgc:mgc_process_config
+0xe02/0x1280
[  345.123472]  [<ffffffff881795bc>] :obdclass:lustre_process_log
+0xb2c/0xee0
[  345.125033]  [<ffffffff88179a40>] :obdclass:server_find_mount
+0x80/0x190
[  345.126421]  [<ffffffff8817f7a6>] :obdclass:server_start_targets
+0xb36/0x17e0
[  345.127819]  [<ffffffff8022d4ac>] __up_write+0x21/0x10d
[  345.128871]  [<ffffffff88183c27>] :obdclass:server_fill_super
+0x18c7/0x1ee0
[  345.130308]  [<ffffffff80208d6d>] __d_lookup+0xb0/0x100
[  345.131812]  [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60
[  345.132994]  [<ffffffff881778bf>] :obdclass:lustre_init_lsi
+0x29f/0x660
[  345.134301]  [<ffffffff88184240>] :obdclass:lustre_fill_super
+0x0/0x1ae0
[  345.135680]  [<ffffffff88185ba3>] :obdclass:lustre_fill_super
+0x1963/0x1ae0
[  345.137254]  [<ffffffff802a95f5>] set_anon_super+0x3c/0xab
[  345.138372]  [<ffffffff802a95b9>] set_anon_super+0x0/0xab
[  345.139609]  [<ffffffff88184240>] :obdclass:lustre_fill_super
+0x0/0x1ae0
[  345.141115]  [<ffffffff802a9805>] get_sb_nodev+0x4f/0x97
[  345.142318]  [<ffffffff802a910b>] vfs_kern_mount+0x93/0x11a
[  345.143573]  [<ffffffff802a91d4>] do_kern_mount+0x36/0x4d
[  345.144754]  [<ffffffff802b1982>] do_mount+0x68c/0x6ff
[  345.145930]  [<ffffffff802088d3>] __handle_mm_fault+0x530/0x91a
[  345.147288]  [<ffffffff80218776>] remove_vma+0x55/0x5c
[  345.148307]  [<ffffffff8021f84a>] __up_read+0x13/0x8a
[  345.149455]  [<ffffffff8020a6af>] do_page_fault+0x3d1/0x706
[  345.150715]  [<ffffffff8020c2e4>] do_path_lookup+0x268/0x28c
[  345.151992]  [<ffffffff80297807>] zone_statistics+0x3e/0x6d
[  345.153145]  [<ffffffff8020dcbc>] __alloc_pages+0x5c/0x29b
[  345.154399]  [<ffffffff802472dd>] sys_mount+0x8a/0xd7
[  345.155550]  [<ffffffff80256d82>] system_call+0x7e/0x83
[  345.156591] 
[  345.156966] 
[  345.156967] Code: 0f 0b 68 aa 6c 3f 80 c2 f0 03 8b 41 10 a8 08 75 2e
f0 0f ba 
[  345.159822] RIP  [<ffffffff80274371>] resched_task+0x24/0x65
[  345.161214]  RSP <ffffffff804ccdc0>
[  345.161948]  <0>Kernel panic - not syncing: Aiee, killing interrupt
handler!
[  345.163565]  





More information about the lustre-discuss mailing list