[Lustre-discuss] Softlockup issues. Lustre related?
Alex Lee
alee at datadirectnet.com
Wed Aug 27 20:20:47 PDT 2008
Hello Folks,
I have few client nodes that are getting soft lockup errors. These are patchless clients running Lustre 1.6.5.1 with kernel 2.6.18-53.1.6.el5-PAPI. More or less stock RHEL 5.1 with PAPI patch added on it. The MDS and OSS are running Lustre 1.6.5.1 with the supplied Lustre kernels and OFED 1.3.1.
I remember there was an issue with __d_lookup in the past but I thought it was fixed with the newest release of Lustre. So I dont know if this is related in anyway at all. I dont see any other real lustre error messages on the client or the MDS/OSS at the time of the softlock up. Also wasnt there a softirq issue? I dont think this is related to that...
Thanks,
-Alex
Here is the syslog output:
Aug 24 04:02:03 papi-0476 syslogd 1.4.1: restart.
Aug 25 00:52:24 papi-0476 kernel: Losing some ticks... checking if CPU frequency changed.
Aug 25 15:17:05 papi-0476 kernel: cpsMPI.x[10493]: segfault at 00002aaab23ce000 rip 00000000004a8fce rs
p 00007fff584d6cf0 error 4
Aug 27 04:15:00 papi-0476 kernel: BUG: soft lockup detected on CPU#4!
Aug 27 04:15:00 papi-0476 kernel:
Aug 27 04:15:00 papi-0476 kernel: Call Trace:
Aug 27 04:15:00 papi-0476 kernel: <IRQ> [<ffffffff800b619e>] softlockup_tick+0xd5/0xe7
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff800910ab>] tasklet_action+0x62/0xac
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff800941bb>] update_process_times+0x54/0x7a
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff80075727>] smp_local_timer_interrupt+0x2c/0x61
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff8005d368>] call_softirq+0x1c/0x28
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff80075def>] smp_apic_timer_interrupt+0x41/0x47
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
Aug 27 04:15:00 papi-0476 kernel: <EOI> [<ffffffff8002c285>] dummy_inode_permission+0x0/0x3
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff800094bb>] __d_lookup+0xd2/0xff
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff80009499>] __d_lookup+0xb0/0xff
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff8000c990>] do_lookup+0x2c/0x1d4
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff80009ee9>] __link_path_walk+0xa01/0xf42
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff8000e5ca>] link_path_walk+0x5c/0xe5
Aug 27 04:15:00 papi-0476 kernel: [<ffffffff8000c7e8>] do_path_lookup+0x270/0x2e8
Aug 27 04:15:01 papi-0476 kernel: [<ffffffff800234de>] __user_walk_fd+0x37/0x4c
Aug 27 04:15:01 papi-0476 kernel: [<ffffffff8003e6cf>] vfs_lstat_fd+0x18/0x47
Aug 27 04:15:01 papi-0476 kernel: [<ffffffff800dc1b0>] sys_newfstatat+0x22/0x43
Aug 27 04:15:01 papi-0476 kernel: [<ffffffff8005c116>] system_call+0x7e/0x83
Aug 27 04:15:01 papi-0476 kernel:
Aug 28 04:03:14 papi-0476 kernel: BUG: soft lockup detected on CPU#0!
Aug 28 04:03:14 papi-0476 kernel:
Aug 28 04:03:14 papi-0476 kernel: Call Trace:
Aug 28 04:03:14 papi-0476 kernel: <IRQ> [<ffffffff800b619e>] softlockup_tick+0xd5/0xe7
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff800941bb>] update_process_times+0x54/0x7a
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff80075727>] smp_local_timer_interrupt+0x2c/0x61
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff80075def>] smp_apic_timer_interrupt+0x41/0x47
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
Aug 28 04:03:14 papi-0476 kernel: <EOI> [<ffffffff8002c285>] dummy_inode_permission+0x0/0x3
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff800094cb>] __d_lookup+0xe2/0xff
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff80009499>] __d_lookup+0xb0/0xff
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8000c990>] do_lookup+0x2c/0x1d4
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff80009ee9>] __link_path_walk+0xa01/0xf42
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8000e5ca>] link_path_walk+0x5c/0xe5
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8000c7e8>] do_path_lookup+0x270/0x2e8
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff800234de>] __user_walk_fd+0x37/0x4c
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8003e6cf>] vfs_lstat_fd+0x18/0x47
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff800dc1b0>] sys_newfstatat+0x22/0x43
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8005c116>] system_call+0x7e/0x83
Aug 28 04:03:14 papi-0476 kernel:
Aug 28 04:12:12 papi-0476 kernel: BUG: soft lockup detected on CPU#3!
Aug 28 04:12:17 papi-0476 kernel:
Aug 28 04:12:18 papi-0476 kernel: Call Trace:
Aug 28 04:12:18 papi-0476 kernel: <IRQ> [<ffffffff800b619e>] softlockup_tick+0xd5/0xe7
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff800941bb>] update_process_times+0x54/0x7a
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80075727>] smp_local_timer_interrupt+0x2c/0x61
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80075def>] smp_apic_timer_interrupt+0x41/0x47
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80007660>] kmem_cache_free+0x1c0/0x1cb
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8009a5dc>] __rcu_process_callbacks+0x122/0x1a8
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8009a685>] rcu_process_callbacks+0x23/0x43
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff800910ab>] tasklet_action+0x62/0xac
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80011cc5>] __do_softirq+0x5e/0xd5
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8005d368>] call_softirq+0x1c/0x28
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8006b565>] do_softirq+0x2c/0x85
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff800dc1b0>] sys_newfstatat+0x22/0x43
Aug 28 04:03:14 papi-0476 kernel: [<ffffffff8005c116>] system_call+0x7e/0x83
Aug 28 04:03:14 papi-0476 kernel:
Aug 28 04:12:12 papi-0476 kernel: BUG: soft lockup detected on CPU#3!
Aug 28 04:12:17 papi-0476 kernel:
Aug 28 04:12:18 papi-0476 kernel: Call Trace:
Aug 28 04:12:18 papi-0476 kernel: <IRQ> [<ffffffff800b619e>] softlockup_tick+0xd5/0xe7
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff800941bb>] update_process_times+0x54/0x7a
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80075727>] smp_local_timer_interrupt+0x2c/0x61
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80075def>] smp_apic_timer_interrupt+0x41/0x47
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80007660>] kmem_cache_free+0x1c0/0x1cb
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8009a5dc>] __rcu_process_callbacks+0x122/0x1a8
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8009a685>] rcu_process_callbacks+0x23/0x43
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff800910ab>] tasklet_action+0x62/0xac
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80011cc5>] __do_softirq+0x5e/0xd5
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8005d368>] call_softirq+0x1c/0x28
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8006b565>] do_softirq+0x2c/0x85
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8005cc8e>] apic_timer_interrupt+0x66/0x6c
Aug 28 04:12:18 papi-0476 kernel: <EOI> [<ffffffff800cb2c6>] __put_swap_token+0xb/0x52
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8003b7dc>] mmput+0x68/0x83
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80014f94>] do_exit+0x28b/0x89d
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80046eac>] cpuset_exit+0x0/0x6c
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff8002ad92>] get_signal_to_deliver+0x42c/0x45a
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff800590f9>] do_notify_resume+0x9c/0x7a9
Aug 28 04:12:18 papi-0476 kernel: [<ffffffff80021d2e>] __up_read+0x19/0x7f
Aug 28 04:12:19 papi-0476 kernel: [<ffffffff80065ac9>] do_page_fault+0x4eb/0x81d
Aug 28 04:12:19 papi-0476 kernel: [<ffffffff800270db>] do_filp_open+0x2a/0x38
Aug 28 04:12:19 papi-0476 kernel: [<ffffffff8003fc68>] do_ioctl+0x5c/0x6b
Aug 28 04:12:19 papi-0476 kernel: [<ffffffff8005c6dc>] retint_signal+0x3d/0x79
Aug 28 04:12:19 papi-0476 kernel:
More information about the lustre-discuss
mailing list