[Lustre-discuss] Lustre-1.8.4 : BUG soft lock up
Joe Landman
landman at scalableinformatics.com
Tue Aug 9 23:27:41 PDT 2011
On 08/10/2011 01:40 AM, Jeff Johnson wrote:
> Greetings,
>
> The below console output is from a 1.8.4 OST (RHEL5.5,
> 2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug
> for sure. Just wondering if anyone has seen this or something very
> similar. Updating to 1.8.6 WC variant isn't an option at this time.
It was stuck in a kernel swap thread for more than 10 seconds. Possibly
a race condition on the disk.
>
> If anyone has some insight into this I'd appreciate the feedback.
>
> Thanks,
>
> --Jeff
>
> BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409]
More to the point, it shouldn't be swapping. What is
sysctl -a | grep swappiness
? and
cat /proc/meminfo | grep -i swap
Likely you have some process with a memory leak, and you need to flush
cache/swap every now and then to make sure it doesn't fill up.
> CPU 6:
> RIP: 0010:[<ffffffff801011bf>] [<ffffffff801011bf>] dqput+0x105/0x19f
This is a quota put. It has some nice spin locks in there, and there
could be some allocations in some of the function calls. I haven't checked.
http://lxr.free-electrons.com/source/fs/quota/dquot.c?a=microblaze#L718
> RSP: 0018:ffff8101be805cd0 EFLAGS: 00000202
> RAX: ffff81012e03f000 RBX: 0000000000000000 RCX: ffff81012e03f000
> RDX: ffffffffffffffe2 RSI: 0000000000000002 RDI: ffff81012f4f01c0
> RBP: ffff81007fb4c918 R08: ffff810000018b00 R09: ffff81007fb4c918
> R10: ffff8101be805c60 R11: ffffffff8b6448f0 R12: ffff8101be805c60
> R13: ffffffff8b6448f0 R14: 00000000ffffffe2 R15: ffffffff8b6448f0
> FS: 0000000000000000(0000) GS:ffff8101bfc2adc0(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000402000 CR3: 0000000000201000 CR4: 00000000000006e0
>
> Call Trace:
> [<ffffffff8010182b>] dquot_drop+0x30/0x5e
> [<ffffffff8b647e83>] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70
> [<ffffffff80022d99>] clear_inode+0xb4/0x123
> [<ffffffff80034e52>] dispose_list+0x41/0xe0
> [<ffffffff8002d6a7>] shrink_icache_memory+0x1b7/0x1e6
> [<ffffffff8003f466>] shrink_slab+0xdc/0x153
> [<ffffffff80057e59>] kswapd+0x343/0x46c
> [<ffffffff800a0ab2>] autoremove_wake_function+0x0/0x2e
> [<ffffffff80057b16>] kswapd+0x0/0x46c
> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4
> [<ffffffff80032890>] kthread+0xfe/0x132
> [<ffffffff8009d728>] request_module+0x0/0x14d
> [<ffffffff8005dfb1>] child_rip+0xa/0x11
> [<ffffffff800a089a>] keventd_create_kthread+0x0/0xc4
> [<ffffffff80032792>] kthread+0x0/0x132
> [<ffffffff8005dfa7>] child_rip+0x0/0x11
There are a couple of bugs in RHEL that this could be similar to.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the lustre-discuss
mailing list