[Lustre-discuss] Panic problem

Marco Aurelio L Gomes mgomes at tpn.usp.br
Thu Jul 8 13:32:04 PDT 2010


Hi,

We have here a lustre 1.8.1 filesystem running on a CentOS 5.4 system
using kernel 2.6.18-128.1.14.el5_lustre.1.8.1. Since 3 days ago, we're
having problems with kernel panic in our mds machines (our setup has 2
mds and 2 oss), and when panic occurs, the other machine mount lustre
mdt filesystem and became principal mds, but after the recover, it panic
also. I attached the dump from kernel panic and would like to know if
someone has this kind of problem, and if someone can help me.

Many thanks in advance.

Regards,
-- 
Marco Gomes
Systems/HPC-Cluster
Numerical Offshore Tank
Naval and Ocean Engineering Department's Laboratory
Escola Politécnica
University of São Paulo
+55 11 3777 4142 ext. 250


-------------- next part --------------
ustreError: 12942:0:(pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg->lm_bufcount > segment) failed
LustreError: 12942:0:(pack_generic.c:655:lustre_shrink_reply_v2()) LBUG
LustreError: dumping log to /tmp/lustre-log.1278619163.12942
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at ...s/root/usr/local/src/aufs.wcvs/aufs/fs/aufs/f_op.c:706
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:07.0/0000:08:00.0/host4/rport-4:0-0/target4:0:0/4:0:0:1/state
CPU 0 
Modules linked in: mds(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) ib_qib(U) mlx4_en(U) mlx4_ib(U) mlx4_core(U) ib_mthca(U) ib_mad(U) ib_core(U) lockd(U) sunrpc(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) dm_mirror(U) dm_log(U) dm_round_robin(U) scsi_dh_rdac(U) dm_multipath(U) scsi_dh(U) dm_mod(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) shpchp(U) sg(U) ehci_hcd(U) uhci_hcd(U) pcspkr(U) qla2xxx(U) scsi_transport_fc(U) i2c_i801(U) i2c_core(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) loop(U) squashfs(U) aufs(U) ext3(U) jbd(U) e1000e(U)
Pid: 13151, comm:  Tainted: G      2.6.18-128.1.14.el5_lustre.1.8.1 #1
RIP: 0010:[<ffffffff8808976a>]  [<ffffffff8808976a>] :aufs:aufs_fsync_nondir+0x86/0x380
RSP: 0000:ffff81066fed3e20  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8103320438c0 RCX: 0000000000000000
RDX: ffff8103352240d0 RSI: ffff81033202aa98 RDI: ffff8103490bc9c0
RBP: ffff81066fed3ec8 R08: 000000000000035a R09: ffff81037e612000
R10: 0000000000000080 R11: 0000000000000000 R12: ffff8103490bc9c0
R13: ffff8103490bc9c0 R14: ffff81033202aa98 R15: ffff8103352240d0
FS:  0000000000000000(0000) GS:ffffffff803f7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b7ffc4dc3a0 CR3: 0000000000201000 CR4: 00000000000006e0
Process  (pid: 13151, threadinfo ffff81066fed2000, task ffff81067f8c0100)
Stack:  0000000100000000 ffff81010c44a040 ffff8103490bc9f8 ffffffff800d73e1
 ffff81010c476000 0000000000000000 ffff81033202aa98 0000000000100603
 ffff81010b2f65f0 0000000000000286 0000000000000282 ffff8103320438c0
Call Trace:
 [<ffffffff800d73e1>] cache_flusharray+0x74/0xa3
 [<ffffffff88778e18>] :libcfs:tracefile_dump_all_pages+0x288/0x2d0
 [<ffffffff8877605b>] :libcfs:libcfs_debug_dumplog_internal+0x8b/0xb0
 [<ffffffff88776098>] :libcfs:libcfs_debug_dumplog_thread+0x18/0x40
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff88776080>] :libcfs:libcfs_debug_dumplog_thread+0x0/0x40
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 0f 0b 68 db 37 0a 88 c2 c2 02 49 39 d7 74 18 48 8d ba b8 00 
RIP  [<ffffffff8808976a>] :aufs:aufs_fsync_nondir+0x86/0x380
 RSP <ffff81066fed3e20>
 <0>Kernel panic - not syncing: Fatal exception
 <0>Dumping qib trace buffer from panic
Done dumping qib trace buffer



More information about the lustre-discuss mailing list