[Lustre-discuss] ASSERTION(PageLocked(bvl->bv_page)) failed

Tommi T tommi_t77 at yahoo.com
Fri Feb 24 09:01:47 PST 2012


Hello

I've been struggling this week with one quite simple lustre setup, one MDT and one SUN X4540 OSS box. OSS hanged at some point due to broken disk and after reboot it usually either hits LBUG/assert or panics. 

MDT/OST e2fsck didn't find anything alarming and full lfsck found only some orphaned objects.

Next I'm going to roll back kernel version back from 1.8.7-wc to old one (1.8,3) but if that is not going to work I'm out of ideas what to do :-(

Feb 24 17:22:28 sahara01 kernel: LustreError: 4883:0:(filter_io_26.c:178:dio_complete_routine()) ASSERTION(PageLocked(bvl->bv_page)) failed
Feb 24 17:22:30 sahara01 kernel: LustreError: 4883:0:(filter_io_26.c:178:dio_complete_routine()) LBUG
Feb 24 17:22:30 sahara01 kernel: LustreError: 4891:0:(filter_io_26.c:178:dio_complete_routine()) ASSERTION(PageLocked(bvl->bv_page)) failed
Feb 24 17:22:30 sahara01 kernel: LustreError: 4891:0:(filter_io_26.c:178:dio_complete_routine()) LBUG
Feb 24 17:22:30 sahara01 kernel: Pid: 4891, comm: md12_raid5
Feb 24 17:22:30 sahara01 kernel: 
Feb 24 17:22:30 sahara01 kernel: Call Trace:
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff888286a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff88828bda>] lbug_with_loc+0x7a/0xd0 [libcfs]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff88830fc0>] tracefile_init+0x0/0x110 [libcfs]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff88ce2248>] dio_complete_routine+0x1b8/0x2a0 [obdfilter]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff883bc277>] copy_data+0x169/0x17f [raid456]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff883c0519>] handle_stripe+0x223f/0x2567 [raid456]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff80062ff2>] thread_return+0x62/0xfe
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff8021d333>] md_super_wait+0xb5/0xbc
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff883c0999>] raid5d+0x158/0x18b [raid456]
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff8003ac3b>] prepare_to_wait+0x34/0x61
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff8022075b>] md_thread+0xf8/0x10e
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff800a2dff>] autoremove_wake_function+0x0/0x2e
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff80220663>] md_thread+0x0/0x10e
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff8003276f>] kthread+0xfe/0x132
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff80032671>] kthread+0x0/0x132
Feb 24 17:22:30 sahara01 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11

or:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/bio.c:222
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/infiniband_mad/umad0/port
CPU 10 
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) lockd(U) sunrpc(U) ip_conntrack_netbios_ns(U) iptable_nat(U) ip_nat(U) ipt_REJECT(U) ipt_LOG(U) xt_limit(U) xt_state(U) ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) be2iscsi(U) ib_iser(U) iscsi_tcp(U) bnx2i(U) cnic(U) uio(U) cxgb3i(U) libcxgbi(U) cxgb3(U) libiscsi_tcp(U) libiscsi2(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ib_sa(U) mlx4_ib(U) ib_mad(U) ib_core(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) raid456(U) xor(U) video(U) backlight(U)
 sbs(U) power_meter(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) mlx4_core(U) amd64_edac_mod(U) k10temp(U) sg(U) edac_mc(U) forcedeth(U) i2c_nforce2(U) hwmon(U) 8021q(U) tpm_tis(U) tpm(U) tpm_bios(U) i2c_core(U) pcspkr(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) usb_storage(U) mptfc(U) scsi_transport_fc(U) mptspi(U) scsi_transport_spi(U) shpchp(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 4891, comm: md11_raid5 Tainted: G     ---- 2.6.18-274.3.1.el5_lustre.g9500ebf #1
RIP: 0010:[<ffffffff8002de97>]  [<ffffffff8002de97>] bio_put+0xa/0x31
RSP: 0018:ffff81082d2bdc78  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
RDX: ffff8103ea3baac0 RSI: 0000000000000001 RDI: ffff8103ea3baac0
RBP: ffff8108203edc08 R08: 0000000000000000 R09: 0000000000000036
R10: ffff81042e078000 R11: ffff810001000000 R12: ffff8103ea3baac0
R13: ffff81042e078000 R14: ffff8104017e28c0 R15: 0000000000000000
FS:  00002aca8f44e6e0(0000) GS:ffff81043e38fa40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000036be69a830 CR3: 0000000000201000 CR4: 00000000000006e0
Process md11_raid5 (pid: 4891, threadinfo ffff81082d2bc000, task ffff810426c687a0)
Stack:  ffffffff88cd32f8 ffff8103e9ea4000 ffff81082d2bdde0 ffff81082d2bde0c
 0000000200000040 ffff81082d2bde00 0000000a2f091140 ffff81043e0c42a0
 0000000000000000 ffff81042fd4c1e0 0000000000000000 0000000100000140
Call Trace:
 [<ffffffff88cd32f8>] :obdfilter:dio_complete_routine+0x268/0x2a0
 [<ffffffff883c2519>] :raid456:handle_stripe+0x223f/0x2567
 [<ffffffff80062ff2>] thread_return+0x62/0xfe
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff883c2999>] :raid456:raid5d+0x158/0x18b
 [<ffffffff8003ac3b>] prepare_to_wait+0x34/0x61
 [<ffffffff8022075b>] md_thread+0xf8/0x10e
 [<ffffffff800a2dff>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80220663>] md_thread+0x0/0x10e
 [<ffffffff8003276f>] kthread+0xfe/0x132
 [<ffffffff80015f80>] do_exit+0x949/0x955
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032671>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 0f 0b 68 b1 df 2b 80 c2 de 00 eb fe f0 ff 4f 50 0f 94 c0 84 
RIP  [<ffffffff8002de97>] bio_put+0xa/0x31
 RSP <ffff81082d2bdc78>
 <0>Kernel panic - not syncing: Fatal exception

BR,

Tommi




More information about the lustre-discuss mailing list