[Lustre-discuss] Kernel BUG at OSS server

Walter Poxon walter at routingdynamics.com
Fri Jul 24 05:58:57 PDT 2009


On Jul 24, 2009, at 3:35 AM, Patricia Santos Marco wrote:

>
>
> Hello!! We have a lustre cluster with two OSS servers with  
> "2.6.16.60-0.31_lustre.1.6.7-smp" kernel. The system is installed  
> since a mouth. The servers have 200 clients and all works well, but  
> the last day one of the OSS serves crashed. This is the log message:
>
> Jul 21 15:40:56 lxsrv3 kernel: Assertion failure in journal_start()  
> at fs/jbd/transaction.c:282: "handle->h_transaction->t_journal ==  
> journal"
> Jul 21 15:40:56 lxsrv3 kernel: ----------- [cut here ] ---------  
> [please bite here ] ---------
> Jul 21 15:40:56 lxsrv3 kernel: Kernel BUG at fs/jbd/transaction.c:282
> Jul 21 15:40:56 lxsrv3 kernel: invalid opcode: 0000 [1] SMP
> Jul 21 15:40:56 lxsrv3 kernel: last sysfs file: /devices/system/cpu/ 
> cpu0/cpufreq/scaling_max_freq
> Jul 21 15:40:56 lxsrv3 kernel: CPU 5
> Jul 21 15:40:56 lxsrv3 kernel: Modules linked in: af_packet quota_v2  
> nfs xt_pkttype ipt_LOG xt_limit obdfilter fsfilt_ldiskfs ost mgc  
> ldiskfs crc16 lustre lov mdc lquota osc ksocklnd ptlrpc obdclass  
> lnet lvfs libcfs nfsd exportfs lockd nfs_acl sunrpc  
> cpufreq_conservative cpufreq_ondemand cpufreq_userspace  
> cpufreq_powersave speedstep_centrino freq_table button battery ac  
> ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat  
> ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink  
> ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod  
> uhci_hcd ehci_hcd shpchp ide_cd i2c_i801 cdrom e1000 usbcore  
> pci_hotplug i2c_core hw_random megaraid_sas ext3 jbd sg edd fan  
> mptsas mptscsih mptbase scsi_transport_sas ahci libata piix thermal  
> processor sd_mod scsi_mod ide_disk ide_core
> Jul 21 15:40:56 lxsrv3 kernel: Pid: 4978, comm: ll_ost_io_91  
> Tainted: G     U 2.6.16.60-0.31_lustre.1.6.7-smp #1
> Jul 21 15:40:56 lxsrv3 kernel: RIP: 0010:[<ffffffff881203a5>]  
> <ffffffff881203a5>{:jbd:journal_start+98}
> Jul 21 15:40:56 lxsrv3 kernel: RSP: 0000:ffff8104393cd348  EFLAGS:  
> 00010292
> Jul 21 15:40:56 lxsrv3 kernel: RAX: 0000000000000073 RBX:  
> ffff810364a5d4f8 RCX: 0000000000000292
> Jul 21 15:40:56 lxsrv3 kernel: RDX: ffffffff8034e968 RSI:  
> 0000000000000296 RDI: ffffffff8034e960
> Jul 21 15:40:56 lxsrv3 kernel: RBP: ffff81044426b400 R08:  
> ffffffff8034e968 R09: ffff81044c47b580
> Jul 21 15:40:56 lxsrv3 kernel: R10: ffff810001071680 R11:  
> ffffffff803c8000 R12: 0000000000000012
> Jul 21 15:40:56 lxsrv3 kernel: R13: ffff8104393cd3d8 R14:  
> 0000000000000080 R15: 0000000000000180
> Jul 21 15:40:56 lxsrv3 kernel: FS:  00002b239e35a6f0(0000)  
> GS:ffff81044f1a66c0(0000) knlGS:0000000000000000
> Jul 21 15:40:56 lxsrv3 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:  
> 000000008005003b
> Jul 21 15:40:56 lxsrv3 kernel: CR2: 00002af6312fa000 CR3:  
> 00000004469fd000 CR4: 00000000000006e0
> Jul 21 15:40:56 lxsrv3 kernel: Process ll_ost_io_91 (pid: 4978,  
> threadinfo ffff8104393cc000, task ffff8104392a50c0)
> Jul 21 15:40:56 lxsrv3 kernel: Stack: ffff8103ea582260  
> ffff8103ea582498 ffff8103ea582260 ffffffff8873f4bc
> Jul 21 15:40:56 lxsrv3 kernel:        ffff8103ea582260  
> ffff8103ea582498 0000000000000000 ffffffff80199ab3
> Jul 21 15:40:56 lxsrv3 kernel:        ffff8104393cd248  
> ffff8103ea582270
> Jul 21 15:40:56 lxsrv3 kernel: Call Trace:  
> <ffffffff8873f4bc>{:ldiskfs:ldiskfs_dquot_drop+60}
> Jul 21 15:40:56 lxsrv3 kernel:        <ffffffff80199ab3>{clear_inode 
> +182} <ffffffff80199e03>{dispose_list+86}
> Jul 21 15:40:56 lxsrv3 kernel:         
> <ffffffff8019a045>{shrink_icache_memory+418}  
> <ffffffff80167db3>{shrink_slab+226}
> Jul 21 15:40:56 lxsrv3 kernel:         
> <ffffffff80168b8d>{try_to_free_pages+408}  
> <ffffffff8016398b>{__alloc_pages+449}
> Jul 21 15:40:56 lxsrv3 kernel:         
> <ffffffff88124ba4>{:jbd:find_revoke_record+98}  
> <ffffffff8015f3bb>{find_or_create_page+53}
> Jul 21 15:40:56 lxsrv3 kernel:         
> <ffffffff88731d61>{:ldiskfs:ldiskfs_truncate+241}  
> <ffffffff80184040>{__getblk+29}
> Jul 21 15:40:56 lxsrv3 kernel:         
> <ffffffff8016b778>{unmap_mapping_range+89}  
> <ffffffff8872fbb7>{:ldiskfs:ldiskfs_mark_iloc_dirty+1047}
> Jul 21 15:40:56 lxsrv3 kernel:        <ffffffff8016d227>{vmtruncate 
> +162} <ffffffff8019aabb>{inode_setattr+34}
> Jul 21 15:40:56 lxsrv3 kernel:         
> <ffffffff8873343b>{:ldiskfs:ldiskfs_setattr+459}  
> <ffffffff8879f4cf>{:fsfilt_ldiskfs:fsfilt_ldiskfs_setattr+287}
>
> What's the problem??
>

Patricia -

This looks similar to a problem described in lustre bug 20008.

You may want to look at https://bugzilla.lustre.org/show_bug.cgi?id=20008
for more info and add a comment describing the crash you experienced  
there
so it is known that other sites are experiencing the problem.

	-walter




More information about the lustre-discuss mailing list