[Lustre-discuss] MDS crash during mount, last_rcvd trick not working

Jakob Goldbach jakob at goldbach.dk
Tue Jan 6 22:02:52 PST 2009


On Tue, 2009-01-06 at 23:09 +0100, Jakob Goldbach wrote:
> 
> Any ideas (other than upgrading from 1.6.4.3) on getting my MDT running
> again ?
> 

I tried upgrading to linux 2.6.22.19 with lustre 1.6.5.1 as I'm running
this on a different setup. Panic as well. 

Any ideas?

Thanks,
/Jakob


Lustre: Enabling user_xattr
BUG: scheduling while atomic: mount.lustre/0xffff8101/19019
Call Trace:
[<ffffffff804d83b7>] schedule+0x5f/0x8df
[<ffffffff8023b08d>] del_timer+0x57/0x62
[<ffffffff80313e72>] __generic_unplug_device+0x25/0x29
[<ffffffff80314236>] generic_unplug_device+0x20/0x32
[<ffffffff804d8d58>] io_schedule+0x2d/0x39
[<ffffffff8029bc3e>] sync_buffer+0x3b/0x3f
[<ffffffff804d9349>] __wait_on_bit+0x47/0x79
[<ffffffff8029bc03>] sync_buffer+0x0/0x3f
[<ffffffff8029bc03>] sync_buffer+0x0/0x3f
[<ffffffff804d93e5>] out_of_line_wait_on_bit+0x6a/0x77
[<ffffffff802451dc>] wake_bit_function+0x0/0x2a
[<ffffffff8029c095>] ll_rw_block+0x95/0xbc
[<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22
[<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80
[<ffffffff883ce506>] :fsfilt_ldiskfs:fsfilt_ldiskfs_read_record
+0x106/0x210
[<ffffffff8025f1bc>] __alloc_pages+0x83/0x2c6
[<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220
[<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee
[<ffffffff8027a74b>] __dentry_open+0x111/0x1bd
[<ffffffff88111811>] :obdclass:llog_lvfs_read_header+0x1a1/0x440
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff8810d183>] :obdclass:llog_init_handle+0xe3/0x8a0
[<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22
[<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80
[<ffffffff8810dd6f>] :obdclass:llog_cat_id2handle+0x18f/0x630
[<ffffffff8811762b>] :obdclass:cat_cancel_cb+0x5b/0x6a0
[<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0
[<ffffffff802909d3>] mntput_no_expire+0x20/0x7d
[<ffffffff881175d0>] :obdclass:cat_cancel_cb+0x0/0x6a0
[<ffffffff880bc10e>] :lvfs:pop_ctxt+0xae/0x2e0
[<ffffffff88118644>] :obdclass:llog_obd_origin_setup+0x684/0xb00
[<ffffffff88118f8e>] :obdclass:llog_setup+0x4ce/0x840
[<ffffffff8826f84f>] :osc:osc_llog_init+0x12f/0x410
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240
[<ffffffff8029bc03>] sync_buffer+0x0/0x3f
[<ffffffff882d8014>] :lov:lov_llog_init+0x274/0x440
[<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240
[<ffffffff883d95b5>] :mds:mds_llog_init+0x1d5/0x280
[<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240
[<ffffffff8026febc>] __vmalloc_node+0x58/0x65
[<ffffffff88116a55>] :obdclass:llog_cat_initialize+0x1c5/0x690
[<ffffffff882f13d8>] :lov:lov_get_info+0x98/0xbf0
[<ffffffff883e2aec>] :mds:mds_lov_update_desc+0x25c/0x9f0
[<ffffffff883ea462>] :mds:mds_lov_connect+0x7e2/0x1b70
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff88118f66>] :obdclass:llog_setup+0x4a6/0x840
[<ffffffff88137ab7>] :obdclass:class_get_profile+0x67/0x1d0
[<ffffffff883f0e5b>] :mds:mds_setup+0x10fb/0x1bf0
[<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff88124766>] :obdclass:class_new_export+0x1f6/0x550
[<ffffffff8813b271>] :obdclass:class_setup+0x7f1/0xcd0
[<ffffffff88122009>] :obdclass:class_name2dev+0x59/0xe0
[<ffffffff8813e1ab>] :obdclass:class_process_config+0x147b/0x1c70
[<ffffffff881406ef>] :obdclass:class_config_llog_handler+0xdef/0x1c50
[<ffffffff8027a8da>] do_filp_open+0x39/0x4b
[<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220
[<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff8813f900>] :obdclass:class_config_llog_handler+0x0/0x1c50
[<ffffffff88136a9d>] :obdclass:class_config_parse_llog+0x18d/0x5e0
[<ffffffff8028b96c>] dput+0x35/0x116
[<ffffffff880bb01c>] :lvfs:lustre_rename+0x16c/0x580
[<ffffffff88394ae8>] :mgc:mgc_process_log+0x278/0x2780
[<ffffffff88397910>] :mgc:mgc_blocking_ast+0x0/0x4b0
[<ffffffff881b0260>] :ptlrpc:ldlm_completion_ast+0x0/0x770
[<ffffffff8031fdaf>] vsnprintf+0x54d/0x593
[<ffffffff8839a0a1>] :mgc:mgc_name2resid+0xd1/0x190
[<ffffffff8839455c>] :mgc:config_log_find+0x6c/0x380
[<ffffffff8839adfa>] :mgc:mgc_process_config+0xc2a/0x1130
[<ffffffff88144524>] :obdclass:lustre_process_log+0x3b4/0xfe0
[<ffffffff88145208>] :obdclass:server_find_mount+0x48/0x1c0
[<ffffffff881498c1>] :obdclass:server_start_targets+0xcc1/0x1ab0
[<ffffffff8814f7f2>] :obdclass:server_fill_super+0x14a2/0x22d0
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff881514ee>] :obdclass:lustre_fill_super+0xece/0x17d4
[<ffffffff8027cff2>] set_anon_super+0x4b/0xb4
[<ffffffff8027d898>] sget+0x378/0x38a
[<ffffffff8027cfa7>] set_anon_super+0x0/0xb4
[<ffffffff88150620>] :obdclass:lustre_fill_super+0x0/0x17d4
[<ffffffff8027de2a>] get_sb_nodev+0x57/0x97
[<ffffffff88141676>] :obdclass:lustre_get_sb+0x16/0x20
[<ffffffff8027ce77>] vfs_kern_mount+0x52/0x8e
[<ffffffff8027cf0c>] do_kern_mount+0x47/0xe2
[<ffffffff80292573>] do_mount+0x671/0x6cb
[<ffffffff8031e8eb>] __up_read+0x8f/0x98
[<ffffffff80247b63>] up_read+0x9/0xb
[<ffffffff804dc675>] do_page_fault+0x447/0x7a8
[<ffffffff802841d6>] release_open_intent+0x17/0x20
[<ffffffff8025f49d>] __get_free_pages+0x32/0x6b
[<ffffffff80290747>] copy_mount_options+0x2f/0x136
[<ffffffff80292656>] sys_mount+0x89/0xd7
[<ffffffff8027a969>] do_sys_open+0x7d/0x8d
[<ffffffff802095fe>] system_call+0x7e/0x83
Unable to handle kernel paging request at fffffffff4482d60 RIP: 
[<ffffffff804d893a>] schedule+0x5e2/0x8df
PGD 203067 PUD 529f067 PMD 0 
Oops: 0000 [1] SMP 
CPU 3 
Modules linked in: mds fsfilt_ldiskfs mgs mgc lustre lov mdc lquota osc
ksocklnd ptlrpc obdclass lnet lvfs libcfs ldiskfs crc16 ipmi_devintf
bonding dm_snapshot dm_mirror dm_mod ipmi_si ipmi_msghandler
Pid: 19019, comm: mount.lustre Not tainted 2.6.22.19-lustre-1.6.5.1 #2
RIP: 0010:[<ffffffff804d893a>]  [<ffffffff804d893a>] schedule
+0x5e2/0x8df
RSP: 0000:ffff81010e7ba448  EFLAGS: 00010083
RAX: 000000000e7ba028 RBX: ffff81012f4a8b20 RCX: ffff81012f4a8b20
RDX: ffffffff806fb0c0 RSI: 0000000000000000 RDI: ffff81012f4a8b20
RBP: ffff81010e7ba528 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000080 R11: ffffffff805ae781 R12: ffff8100052c5a4c
R13: ffff81010e7ba5b8 R14: ffff8100052c4800 R15: 00000eb61b0422d7
FS:  00002b7f59abe6d0(0000) GS:ffff81012ff073c0(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: fffffffff4482d60 CR3: 000000010257a000 CR4: 00000000000006e0
Process mount.lustre (pid: 19019, threadinfo ffff81010e7ba000, task
ffff81012f4a8b20)
Stack:  ffff81012fa1d868 0000000000000000 ffff81010e7ba5b8
ffff81010e7ba4d8
ffff81010e7ba498 ffffffff806f8800 ffff81010e7ba498 0000000000000086
ffff81012fa1d7a8 ffff81012fa1d7a8 0000000000000004 ffff81012f4a8b20
Call Trace:
[<ffffffff80314236>] generic_unplug_device+0x20/0x32
[<ffffffff804d8d58>] io_schedule+0x2d/0x39
[<ffffffff8029bc3e>] sync_buffer+0x3b/0x3f
[<ffffffff804d9349>] __wait_on_bit+0x47/0x79
[<ffffffff8029bc03>] sync_buffer+0x0/0x3f
[<ffffffff8029bc03>] sync_buffer+0x0/0x3f
[<ffffffff804d93e5>] out_of_line_wait_on_bit+0x6a/0x77
[<ffffffff802451dc>] wake_bit_function+0x0/0x2a
[<ffffffff8029c095>] ll_rw_block+0x95/0xbc
[<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22
[<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80
[<ffffffff883ce506>] :fsfilt_ldiskfs:fsfilt_ldiskfs_read_record
+0x106/0x210
[<ffffffff8025f1bc>] __alloc_pages+0x83/0x2c6
[<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220
[<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee
[<ffffffff8027a74b>] __dentry_open+0x111/0x1bd
[<ffffffff88111811>] :obdclass:llog_lvfs_read_header+0x1a1/0x440
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff8810d183>] :obdclass:llog_init_handle+0xe3/0x8a0
[<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22
[<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80
[<ffffffff8810dd6f>] :obdclass:llog_cat_id2handle+0x18f/0x630
[<ffffffff8811762b>] :obdclass:cat_cancel_cb+0x5b/0x6a0
[<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0
[<ffffffff802909d3>] mntput_no_expire+0x20/0x7d
[<ffffffff881175d0>] :obdclass:cat_cancel_cb+0x0/0x6a0
[<ffffffff880bc10e>] :lvfs:pop_ctxt+0xae/0x2e0
[<ffffffff88118644>] :obdclass:llog_obd_origin_setup+0x684/0xb00
[<ffffffff88118f8e>] :obdclass:llog_setup+0x4ce/0x840
[<ffffffff8826f84f>] :osc:osc_llog_init+0x12f/0x410
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240
[<ffffffff8029bc03>] sync_buffer+0x0/0x3f
[<ffffffff882d8014>] :lov:lov_llog_init+0x274/0x440
[<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240
[<ffffffff883d95b5>] :mds:mds_llog_init+0x1d5/0x280
[<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240
[<ffffffff8026febc>] __vmalloc_node+0x58/0x65
[<ffffffff88116a55>] :obdclass:llog_cat_initialize+0x1c5/0x690
[<ffffffff882f13d8>] :lov:lov_get_info+0x98/0xbf0
[<ffffffff883e2aec>] :mds:mds_lov_update_desc+0x25c/0x9f0
[<ffffffff883ea462>] :mds:mds_lov_connect+0x7e2/0x1b70
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff88118f66>] :obdclass:llog_setup+0x4a6/0x840
[<ffffffff88137ab7>] :obdclass:class_get_profile+0x67/0x1d0
[<ffffffff883f0e5b>] :mds:mds_setup+0x10fb/0x1bf0
[<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff88124766>] :obdclass:class_new_export+0x1f6/0x550
[<ffffffff8813b271>] :obdclass:class_setup+0x7f1/0xcd0
[<ffffffff88122009>] :obdclass:class_name2dev+0x59/0xe0
[<ffffffff8813e1ab>] :obdclass:class_process_config+0x147b/0x1c70
[<ffffffff881406ef>] :obdclass:class_config_llog_handler+0xdef/0x1c50
[<ffffffff8027a8da>] do_filp_open+0x39/0x4b
[<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220
[<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff8813f900>] :obdclass:class_config_llog_handler+0x0/0x1c50
[<ffffffff88136a9d>] :obdclass:class_config_parse_llog+0x18d/0x5e0
[<ffffffff8028b96c>] dput+0x35/0x116
[<ffffffff880bb01c>] :lvfs:lustre_rename+0x16c/0x580
[<ffffffff88394ae8>] :mgc:mgc_process_log+0x278/0x2780
[<ffffffff88397910>] :mgc:mgc_blocking_ast+0x0/0x4b0
[<ffffffff881b0260>] :ptlrpc:ldlm_completion_ast+0x0/0x770
[<ffffffff8031fdaf>] vsnprintf+0x54d/0x593
[<ffffffff8839a0a1>] :mgc:mgc_name2resid+0xd1/0x190
[<ffffffff8839455c>] :mgc:config_log_find+0x6c/0x380
[<ffffffff8839adfa>] :mgc:mgc_process_config+0xc2a/0x1130
[<ffffffff88144524>] :obdclass:lustre_process_log+0x3b4/0xfe0
[<ffffffff88145208>] :obdclass:server_find_mount+0x48/0x1c0
[<ffffffff881498c1>] :obdclass:server_start_targets+0xcc1/0x1ab0
[<ffffffff8814f7f2>] :obdclass:server_fill_super+0x14a2/0x22d0
[<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90
[<ffffffff881514ee>] :obdclass:lustre_fill_super+0xece/0x17d4
[<ffffffff8027cff2>] set_anon_super+0x4b/0xb4
[<ffffffff8027d898>] sget+0x378/0x38a
[<ffffffff8027cfa7>] set_anon_super+0x0/0xb4
[<ffffffff88150620>] :obdclass:lustre_fill_super+0x0/0x17d4
[<ffffffff8027de2a>] get_sb_nodev+0x57/0x97
[<ffffffff88141676>] :obdclass:lustre_get_sb+0x16/0x20
[<ffffffff8027ce77>] vfs_kern_mount+0x52/0x8e
[<ffffffff8027cf0c>] do_kern_mount+0x47/0xe2
[<ffffffff80292573>] do_mount+0x671/0x6cb
[<ffffffff8031e8eb>] __up_read+0x8f/0x98
[<ffffffff80247b63>] up_read+0x9/0xb
[<ffffffff804dc675>] do_page_fault+0x447/0x7a8
[<ffffffff802841d6>] release_open_intent+0x17/0x20
[<ffffffff8025f49d>] __get_free_pages+0x32/0x6b
[<ffffffff80290747>] copy_mount_options+0x2f/0x136
[<ffffffff80292656>] sys_mount+0x89/0xd7
[<ffffffff8027a969>] do_sys_open+0x7d/0x8d
[<ffffffff802095fe>] system_call+0x7e/0x83
Code: 48 8b 04 c5 20 2c 6b 80 48 8b 40 08 c7 44 02 08 01 00 00 00 
RIP  [<ffffffff804d893a>] schedule+0x5e2/0x8df
RSP <ffff81010e7ba448>
CR2: fffffffff4482d60
Kernel panic - not syncing: Aiee, killing interrupt handler!







More information about the lustre-discuss mailing list