[Lustre-discuss] oracle lustre 1.8.7 kernel panics on dell c6145 - AMD Opteron 6234

Lenny Shovsky lenny at wirewalk.com
Tue Jul 17 20:48:42 PDT 2012


this has been otherwise very stable on similar Opteron 6174 platforms
and many new Xeons
but Opteron 6234 seems to have issues.

smp related ? is anyone using amd 6234 models or specifically dell c6145s ?

full crash output is here.

http://pastebin.com/AtvtCwXf

sample is below.

thanks a lot in advance !



AMD Opteron(TM) Processor 6234                  stepping 02
Brought up 48 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2400.187 MHz processor.
divide error: 0000 [1] SMP
last sysfs file:
CPU 1
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.18-194.17.1.el5_lustre.1.8.7 #1
RIP: 0010:[<ffffffff8008bb03>]  [<ffffffff8008bb03>]
find_busiest_group+0x23a/0x621
RSP: 0018:ffff81102805fdb8  EFLAGS: 00010046
RAX: 0000000000004000 RBX: 00000000000000ff RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c0
RBP: ffff81102805fea8 R08: 0000000000000006 R09: 000000000000003a
R10: ffff810836279e08 R11: 0000000000000048 R12: ffff810836279e00
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000004000
FS:  0000000000000000(0000) GS:ffff81010e95eec0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff810828060000, task ffff8104280537a0)
Stack:  0000000000000000 ffff81102805fee8 ffff81102805ff10 0000000000000000
 ffff81102805ff08 000000010100caa0 ffff81000100e260 0000000000000000
 0000000000000000 0000000000000000 0000000000000080 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff8008dba0>] rebalance_tick+0x183/0x3cb
 [<ffffffff8009829f>] update_process_times+0x68/0x78
 [<ffffffff80077bc3>] smp_local_timer_interrupt+0x2f/0x66
 [<ffffffff800781ff>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff80057018>] mwait_idle+0x0/0x4a
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8005704e>] mwait_idle+0x36/0x4a
 [<ffffffff80049206>] cpu_idle+0x95/0xb8
 [<ffffffff8007796b>] start_secondary+0x498/0x4a7


Code: 48 f7 f6 49 c1 ee 07 83 7d cc 00 74 1c 48 8b 55 d0 4c 89 a5
RIP  [<ffffffff8008bb03>] find_busiest_group+0x23a/0x621
 RSP <ffff81102805fdb8>
 <0>Kernel panic - not syncing: Fatal exception
divide error: 0000 [2] SMP
last sysfs file:
CPU 5
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.18-194.17.1.el5_lustre.1.8.7 #1
RIP: 0010:[<ffffffff8008bb03>]  [<ffffffff8008bb03>]
find_busiest_group+0x23a/0x621
RSP: 0018:ffff81083616bdb8  EFLAGS: 00010046
RAX: 0000000000004000 RBX: 00000000000000ff RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c0
RBP: ffff81083616bea8 R08: 0000000000000006 R09: 000000000000003a
R10: ffff810836279e08 R11: 0000000000000048 R12: ffff810836279e00
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000004000
FS:  0000000000000000(0000) GS:ffff81183611a2c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81010e9d0000, task ffff811c36155080)
Stack:  0000000000000000 ffff81083616bee8 ffff81083616bf10 0000000000000000
 ffff81083616bf08 0000000500000000 ffff81000102fc60 0000000000000000
 0000000000000000 0000000000000000 0000000000000080 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff8008dba0>] rebalance_tick+0x183/0x3cb
 [<ffffffff8009829f>] update_process_times+0x68/0x78
 [<ffffffff80077bc3>] smp_local_timer_interrupt+0x2f/0x66
 [<ffffffff800781ff>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff80057018>] mwait_idle+0x0/0x4a
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8005704e>] mwait_idle+0x36/0x4a
 [<ffffffff80049206>] cpu_idle+0x95/0xb8
 [<ffffffff8007796b>] start_secondary+0x498/0x4a7


Code: 48 f7 f6 49 c1 ee 07 83 7d cc 00 74 1c 48 8b 55 d0 4c 89 a5
RIP  [<ffffffff8008bb03>] find_busiest_group+0x23a/0x621
 RSP <ffff81083616bdb8>
 <0>Kernel panic - not syncing: Fatal exception
 <0>divide error: 0000 [3] SMP
last sysfs file:
CPU 8
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.18-194.17.1.el5_lustre.1.8.7 #1
RIP: 0010:[<ffffffff8008bb03>]  [<ffffffff8008bb03>]
find_busiest_group+0x23a/0x621
RSP: 0018:ffff811036173db8  EFLAGS: 00010002
RAX: 0000000000000000 RBX: 00000000000000ff RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c0
RBP: ffff811036173ea8 R08: 0000000000000012 R09: 000000000000002e
R10: ffff8104363251c8 R11: 0000000000000048 R12: ffff8104363251c0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff810436286940(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81083618e000, task ffff810436229820)
Stack:  0000000000000000 ffff811036173ee8 ffff811036173f10 0000000000000000
 ffff811036173f08 0000000800000000 ffff8104360b38e0 0000000000000000
 ffff810436325180 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 <IRQ>  [<ffffffff8008dba0>] rebalance_tick+0x183/0x3cb
 [<ffffffff8009829f>] update_process_times+0x68/0x78
 [<ffffffff80077bc3>] smp_local_timer_interrupt+0x2f/0x66
 [<ffffffff800781ff>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff80057018>] mwait_idle+0x0/0x4a
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8005704e>] mwait_idle+0x36/0x4a
 [<ffffffff80049206>] cpu_idle+0x95/0xb8
 [<ffffffff8007796b>] start_secondary+0x498/0x4a7



More information about the lustre-discuss mailing list