[Lustre-discuss] [RESOLVED] Strange MDS Problem + Resolution
Aaron Knister
aaron.knister at gmail.com
Sun Sep 27 15:46:14 PDT 2009
I wanted to post this here so in the event that anybody else stumbles
across this problem they don't spend hours banging their head against
a brick wall. I was helping with a lustre disk setup that kept
crashing. The lustre filesystem would hang and there would be one
thread (ll_mdt_[0-9]*) that would be pegged at 100% of the cpu. It
turns out there was some on disk inconsistencies as a result of the
MDS crashing because it ran out of memory. A simple fsck of the MDT
fixed the issue, after many hours of attempted debugging. We didn't
think the problem could be fixed by a simple fsck...but it makes sense.
Here's the call trace-
BUG: soft lockup - CPU#0 stuck for 10s! [ll_mdt_26:12829]
CPU 0:
Modules linked in: mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U)
lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc
(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_ucm(U) ib_sdp(U) rdma_cm
(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U)
ib_uverbs(U) ib_umad(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U)
mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) crc16(U) ipmi_devintf(U)
mptctl(U) mptbase(U) ipmi_si(U) ipmi_msghandler(U) dell_rbu(U) autofs4
(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U)
xfrm_nalgo(U) crypto_api(U) dm_multipath(U) video(U) sbs(U) backlight
(U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U)
acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U)
pata_acpi(U) ata_piix(U) libata(U) sr_mod(U) sg(U) shpchp(U) ide_cd(U)
i5000_edac(U) bnx2(U) serio_raw(U) edac_mc(U) cdrom(U) pcspkr(U)
dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) usb_storage(U)
megaraid_sas(U) sd_mod(U) scsi_mod(U) ext3(U) jb
(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 12829, comm: ll_mdt_26 Tainted: G 2.6.18-92.1.10.el5_lustre.
1.6.6smp #1
RIP: 0010:[<ffffffff887ff8bc>] [<ffffffff887ff8bc>] :ldiskfs:do_split
+0x3ec/0x560
RSP: 0018:ffff8103f4fab470 EFLAGS: 00000206
RAX: 0000000000000000 RBX: 0000000000000024 RCX: 0000000000000000
RDX: 0000000000000024 RSI: ffff8103aa719bb0 RDI: ffff8103aa719800
RBP: ffff8103fdd50d30 R08: 383030322e786e39 R09: 0000000031323730
R10: 000000006a3ef844 R11: ffff8103aa719cf8 R12: ffff81018bb81f70
R13: ffff81017ad46f70 R14: ffff810093d3cc10 R15: 0000000000000000
FS: 00002b55d25bc220(0000) GS:ffffffff803eb000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000f1bff000 CR3: 00000004159d4000 CR4: 00000000000006e0
Call Trace:
[<ffffffff88800395>] :ldiskfs:ldiskfs_add_entry+0x4f5/0x980
[<ffffffff8006d8f0>] do_gettimeofday+0x50/0x92
[<ffffffff88800e36>] :ldiskfs:ldiskfs_add_nondir+0x26/0x90
[<ffffffff88801756>] :ldiskfs:ldiskfs_create+0xf6/0x140
[<ffffffff888802ff>] :fsfilt_ldiskfs:fsfilt_ldiskfs_start+0x55f/0x630
[<ffffffff8003a049>] vfs_create+0xe6/0x158
[<ffffffff88b10453>] :mds:mds_open+0x15a3/0x332e
[<ffffffff884c30e8>] :lvfs:entry_set_group_info+0xd8/0x2c0
[<ffffffff884c33fb>] :lvfs:alloc_entry+0x12b/0x140
[<ffffffff88666434>] :ko2iblnd:kiblnd_check_sends+0x644/0x7f0
[<ffffffff88546031>] :obdclass:class_handle2object+0xd1/0x160
[<ffffffff885a619e>] :ptlrpc:lock_res_and_lock+0xbe/0xe0
[<ffffffff88aed889>] :mds:mds_reint_rec+0x1d9/0x2b0
[<ffffffff88b14143>] :mds:mds_open_unpack+0x2f3/0x410
[<ffffffff88ae08da>] :mds:mds_reint+0x35a/0x420
[<ffffffff88adef62>] :mds:fixup_handle_for_resent_req+0x52/0x200
[<ffffffff88ae492c>] :mds:mds_intent_policy+0x48c/0xc40
[<ffffffff885db765>] :ptlrpc:ptlrpc_prep_set+0x1f5/0x2a0
[<ffffffff885ab926>] :ptlrpc:ldlm_lock_enqueue+0x186/0x990
[<ffffffff885a7a24>] :ptlrpc:ldlm_lock_remove_from_lru+0x74/0xe0
[<ffffffff885cd5c0>] :ptlrpc:ldlm_server_completion_ast+0x0/0x5c0
[<ffffffff885cae85>] :ptlrpc:ldlm_handle_enqueue+0xca5/0x12a0
[<ffffffff885cdb80>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x6b2
[<ffffffff88ae9115>] :mds:mds_handle+0x4035/0x4cf0
[<ffffffff80143a09>] __next_cpu+0x19/0x28
[<ffffffff80089ab6>] find_busiest_group+0x20d/0x621
More information about the lustre-discuss
mailing list