[Lustre-discuss] Kernel BUG at OSS server
Walter Poxon
walter at routingdynamics.com
Fri Jul 24 05:58:57 PDT 2009
On Jul 24, 2009, at 3:35 AM, Patricia Santos Marco wrote:
>
>
> Hello!! We have a lustre cluster with two OSS servers with
> "2.6.16.60-0.31_lustre.1.6.7-smp" kernel. The system is installed
> since a mouth. The servers have 200 clients and all works well, but
> the last day one of the OSS serves crashed. This is the log message:
>
> Jul 21 15:40:56 lxsrv3 kernel: Assertion failure in journal_start()
> at fs/jbd/transaction.c:282: "handle->h_transaction->t_journal ==
> journal"
> Jul 21 15:40:56 lxsrv3 kernel: ----------- [cut here ] ---------
> [please bite here ] ---------
> Jul 21 15:40:56 lxsrv3 kernel: Kernel BUG at fs/jbd/transaction.c:282
> Jul 21 15:40:56 lxsrv3 kernel: invalid opcode: 0000 [1] SMP
> Jul 21 15:40:56 lxsrv3 kernel: last sysfs file: /devices/system/cpu/
> cpu0/cpufreq/scaling_max_freq
> Jul 21 15:40:56 lxsrv3 kernel: CPU 5
> Jul 21 15:40:56 lxsrv3 kernel: Modules linked in: af_packet quota_v2
> nfs xt_pkttype ipt_LOG xt_limit obdfilter fsfilt_ldiskfs ost mgc
> ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc obdclass
> lnet lvfs libcfs nfsd exportfs lockd nfs_acl sunrpc
> cpufreq_conservative cpufreq_ondemand cpufreq_userspace
> cpufreq_powersave speedstep_centrino freq_table button battery ac
> ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat
> ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink
> ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod
> uhci_hcd ehci_hcd shpchp ide_cd i2c_i801 cdrom e1000 usbcore
> pci_hotplug i2c_core hw_random megaraid_sas ext3 jbd sg edd fan
> mptsas mptscsih mptbase scsi_transport_sas ahci libata piix thermal
> processor sd_mod scsi_mod ide_disk ide_core
> Jul 21 15:40:56 lxsrv3 kernel: Pid: 4978, comm: ll_ost_io_91
> Tainted: G U 2.6.16.60-0.31_lustre.1.6.7-smp #1
> Jul 21 15:40:56 lxsrv3 kernel: RIP: 0010:[<ffffffff881203a5>]
> <ffffffff881203a5>{:jbd:journal_start+98}
> Jul 21 15:40:56 lxsrv3 kernel: RSP: 0000:ffff8104393cd348 EFLAGS:
> 00010292
> Jul 21 15:40:56 lxsrv3 kernel: RAX: 0000000000000073 RBX:
> ffff810364a5d4f8 RCX: 0000000000000292
> Jul 21 15:40:56 lxsrv3 kernel: RDX: ffffffff8034e968 RSI:
> 0000000000000296 RDI: ffffffff8034e960
> Jul 21 15:40:56 lxsrv3 kernel: RBP: ffff81044426b400 R08:
> ffffffff8034e968 R09: ffff81044c47b580
> Jul 21 15:40:56 lxsrv3 kernel: R10: ffff810001071680 R11:
> ffffffff803c8000 R12: 0000000000000012
> Jul 21 15:40:56 lxsrv3 kernel: R13: ffff8104393cd3d8 R14:
> 0000000000000080 R15: 0000000000000180
> Jul 21 15:40:56 lxsrv3 kernel: FS: 00002b239e35a6f0(0000)
> GS:ffff81044f1a66c0(0000) knlGS:0000000000000000
> Jul 21 15:40:56 lxsrv3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Jul 21 15:40:56 lxsrv3 kernel: CR2: 00002af6312fa000 CR3:
> 00000004469fd000 CR4: 00000000000006e0
> Jul 21 15:40:56 lxsrv3 kernel: Process ll_ost_io_91 (pid: 4978,
> threadinfo ffff8104393cc000, task ffff8104392a50c0)
> Jul 21 15:40:56 lxsrv3 kernel: Stack: ffff8103ea582260
> ffff8103ea582498 ffff8103ea582260 ffffffff8873f4bc
> Jul 21 15:40:56 lxsrv3 kernel: ffff8103ea582260
> ffff8103ea582498 0000000000000000 ffffffff80199ab3
> Jul 21 15:40:56 lxsrv3 kernel: ffff8104393cd248
> ffff8103ea582270
> Jul 21 15:40:56 lxsrv3 kernel: Call Trace:
> <ffffffff8873f4bc>{:ldiskfs:ldiskfs_dquot_drop+60}
> Jul 21 15:40:56 lxsrv3 kernel: <ffffffff80199ab3>{clear_inode
> +182} <ffffffff80199e03>{dispose_list+86}
> Jul 21 15:40:56 lxsrv3 kernel:
> <ffffffff8019a045>{shrink_icache_memory+418}
> <ffffffff80167db3>{shrink_slab+226}
> Jul 21 15:40:56 lxsrv3 kernel:
> <ffffffff80168b8d>{try_to_free_pages+408}
> <ffffffff8016398b>{__alloc_pages+449}
> Jul 21 15:40:56 lxsrv3 kernel:
> <ffffffff88124ba4>{:jbd:find_revoke_record+98}
> <ffffffff8015f3bb>{find_or_create_page+53}
> Jul 21 15:40:56 lxsrv3 kernel:
> <ffffffff88731d61>{:ldiskfs:ldiskfs_truncate+241}
> <ffffffff80184040>{__getblk+29}
> Jul 21 15:40:56 lxsrv3 kernel:
> <ffffffff8016b778>{unmap_mapping_range+89}
> <ffffffff8872fbb7>{:ldiskfs:ldiskfs_mark_iloc_dirty+1047}
> Jul 21 15:40:56 lxsrv3 kernel: <ffffffff8016d227>{vmtruncate
> +162} <ffffffff8019aabb>{inode_setattr+34}
> Jul 21 15:40:56 lxsrv3 kernel:
> <ffffffff8873343b>{:ldiskfs:ldiskfs_setattr+459}
> <ffffffff8879f4cf>{:fsfilt_ldiskfs:fsfilt_ldiskfs_setattr+287}
>
> What's the problem??
>
Patricia -
This looks similar to a problem described in lustre bug 20008.
You may want to look at https://bugzilla.lustre.org/show_bug.cgi?id=20008
for more info and add a comment describing the crash you experienced
there
so it is known that other sites are experiencing the problem.
-walter
More information about the lustre-discuss
mailing list