[Lustre-discuss] Kernel BUG at OSS server

Patricia Santos Marco psantos at bifi.es
Fri Jul 24 01:35:54 PDT 2009


Hello!! We have a lustre cluster with two OSS servers with
"2.6.16.60-0.31_lustre.1.6.7-smp" kernel. The system is installed since a
mouth. The servers have 200 clients and all works well, but the last day one
of the OSS serves crashed. This is the log message:

Jul 21 15:40:56 lxsrv3 kernel: Assertion failure in journal_start() at
fs/jbd/transaction.c:282: "handle->h_transaction->t_journal ==
journal"

Jul 21 15:40:56 lxsrv3 kernel: ----------- [cut here ] --------- [please
bite here ] ---------
Jul 21 15:40:56 lxsrv3 kernel: Kernel BUG at
fs/jbd/transaction.c:282

Jul 21 15:40:56 lxsrv3 kernel: invalid opcode: 0000 [1]
SMP
Jul 21 15:40:56 lxsrv3 kernel: last sysfs file:
/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

Jul 21 15:40:56 lxsrv3 kernel: CPU
5

Jul 21 15:40:56 lxsrv3 kernel: Modules linked in: af_packet quota_v2 nfs
xt_pkttype ipt_LOG xt_limit obdfilter fsfilt_ldiskfs ost mgc ldiskfs crc16
lustre lov mdc lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs nfsd
exportfs lockd nfs_acl sunrpc cpufreq_conservative cpufreq_ondemand
cpufreq_userspace cpufreq_powersave speedstep_centrino freq_table button
battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle
iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink
ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod uhci_hcd
ehci_hcd shpchp ide_cd i2c_i801 cdrom e1000 usbcore pci_hotplug i2c_core
hw_random megaraid_sas ext3 jbd sg edd fan mptsas mptscsih mptbase
scsi_transport_sas ahci libata piix thermal processor sd_mod scsi_mod
ide_disk
ide_core

Jul 21 15:40:56 lxsrv3 kernel: Pid: 4978, comm: ll_ost_io_91 Tainted: G
U 2.6.16.60-0.31_lustre.1.6.7-smp #1
Jul 21 15:40:56 lxsrv3 kernel: RIP: 0010:[<ffffffff881203a5>]
<ffffffff881203a5>{:jbd:journal_start+98}
Jul 21 15:40:56 lxsrv3 kernel: RSP: 0000:ffff8104393cd348  EFLAGS:
00010292
Jul 21 15:40:56 lxsrv3 kernel: RAX: 0000000000000073 RBX: ffff810364a5d4f8
RCX: 0000000000000292
Jul 21 15:40:56 lxsrv3 kernel: RDX: ffffffff8034e968 RSI: 0000000000000296
RDI: ffffffff8034e960
Jul 21 15:40:56 lxsrv3 kernel: RBP: ffff81044426b400 R08: ffffffff8034e968
R09: ffff81044c47b580
Jul 21 15:40:56 lxsrv3 kernel: R10: ffff810001071680 R11: ffffffff803c8000
R12: 0000000000000012
Jul 21 15:40:56 lxsrv3 kernel: R13: ffff8104393cd3d8 R14: 0000000000000080
R15: 0000000000000180
Jul 21 15:40:56 lxsrv3 kernel: FS:  00002b239e35a6f0(0000)
GS:ffff81044f1a66c0(0000) knlGS:0000000000000000
Jul 21 15:40:56 lxsrv3 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Jul 21 15:40:56 lxsrv3 kernel: CR2: 00002af6312fa000 CR3: 00000004469fd000
CR4: 00000000000006e0
Jul 21 15:40:56 lxsrv3 kernel: Process ll_ost_io_91 (pid: 4978, threadinfo
ffff8104393cc000, task ffff8104392a50c0)
Jul 21 15:40:56 lxsrv3 kernel: Stack: ffff8103ea582260 ffff8103ea582498
ffff8103ea582260 ffffffff8873f4bc
Jul 21 15:40:56 lxsrv3 kernel:        ffff8103ea582260 ffff8103ea582498
0000000000000000 ffffffff80199ab3
Jul 21 15:40:56 lxsrv3 kernel:        ffff8104393cd248
ffff8103ea582270
Jul 21 15:40:56 lxsrv3 kernel: Call Trace:
<ffffffff8873f4bc>{:ldiskfs:ldiskfs_dquot_drop+60}

Jul 21 15:40:56 lxsrv3 kernel:        <ffffffff80199ab3>{clear_inode+182}
<ffffffff80199e03>{dispose_list+86}
Jul 21 15:40:56 lxsrv3 kernel:
<ffffffff8019a045>{shrink_icache_memory+418}
<ffffffff80167db3>{shrink_slab+226}
Jul 21 15:40:56 lxsrv3 kernel:
<ffffffff80168b8d>{try_to_free_pages+408}
<ffffffff8016398b>{__alloc_pages+449}
Jul 21 15:40:56 lxsrv3 kernel:
<ffffffff88124ba4>{:jbd:find_revoke_record+98}
<ffffffff8015f3bb>{find_or_create_page+53}

Jul 21 15:40:56 lxsrv3 kernel:
<ffffffff88731d61>{:ldiskfs:ldiskfs_truncate+241}
<ffffffff80184040>{__getblk+29}
Jul 21 15:40:56 lxsrv3 kernel:
<ffffffff8016b778>{unmap_mapping_range+89}
<ffffffff8872fbb7>{:ldiskfs:ldiskfs_mark_iloc_dirty+1047}

Jul 21 15:40:56 lxsrv3 kernel:        <ffffffff8016d227>{vmtruncate+162}
<ffffffff8019aabb>{inode_setattr+34}
Jul 21 15:40:56 lxsrv3 kernel:
<ffffffff8873343b>{:ldiskfs:ldiskfs_setattr+459}
<ffffffff8879f4cf>{:fsfilt_ldiskfs:fsfilt_ldiskfs_setattr+287}

What's the problem??
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090724/b3f035e8/attachment.htm>


More information about the lustre-discuss mailing list