[Lustre-discuss] Question about sleeping processes
Michael Schwartzkopff
misch at multinet.de
Tue Oct 6 08:01:10 PDT 2009
Am Dienstag, 6. Oktober 2009 16:22:08 schrieb Brian J. Murrell:
> On Tue, 2009-10-06 at 12:48 +0200, Michael Schwartzkopff wrote:
> > Hi,
>
> Hi,
>
> > my system load shows that quite a number of processes are waiting.
>
> Blocked. I guess the word waiting is similar.
>
> > My questions are:
> > What causes the problem?
>
> In this case, the thread has lbugged previously.
>
> If you look in syslog for node with these processes you should find
> entries with LBUG and/or ASSERTION messages. These are the defects that
> are causing the processes to get blocked (uninteruptable sleep)
(...)
Here is some additional from the logs. Any ideas about that?
Oct 5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
(pack_generic.c:655:lustre_shrink_reply_v2()) ASSERTION(msg->lm_bufcount >
segment) failed
Oct 5 10:26:43 sosmds2 kernel: LustreError: 30617:0:
(pack_generic.c:655:lustre_shrink_reply_v2()) LBUG
Oct 5 10:26:43 sosmds2 kernel: Lustre: 30617:0:(linux-
debug.c:264:libcfs_debug_dumpstack()) showing stack for process 30617
Oct 5 10:26:43 sosmds2 kernel: ll_mdt_47 R running task 0 30617
1 30618 30616 (L-TLB)
Oct 5 10:26:43 sosmds2 kernel: 0000000000000000 0000000000000001
0000000714a28100 0000000000000001
Oct 5 10:26:43 sosmds2 kernel: 0000000000000001 0000000000000086
0000000000000012 ffff8102212dfe88
Oct 5 10:26:43 sosmds2 kernel: 0000000000000001 0000000000000000
ffffffff802f6aa0 0000000000000000
Oct 5 10:26:43 sosmds2 kernel: Call Trace:
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8009daf8>]
autoremove_wake_function+0x9/0x2e
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80088819>] __wake_up_common+0x3e/0x68
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80088819>] __wake_up_common+0x3e/0x68
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8008f7ac>] vprintk+0x2cb/0x317
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff800a540a>] kallsyms_lookup+0xc2/0x17b
Oct 5 10:26:43 sosmds2 last message repeated 3 times
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8006bb5d>] printk_address+0x9f/0xab
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8008f800>] printk+0x8/0xbd
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8008f84a>] printk+0x52/0xbd
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff800a2e08>]
module_text_address+0x33/0x3c
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8009c088>]
kernel_text_address+0x1a/0x26
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8006b843>] dump_trace+0x211/0x23a
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8006b8a0>] show_trace+0x34/0x47
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8006b9a5>] _show_stack+0xdb/0xea
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff885b1ada>]
:libcfs:lbug_with_loc+0x7a/0xd0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff885b9c70>]
:libcfs:tracefile_init+0x0/0x110
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff88712218>]
:ptlrpc:lustre_shrink_reply_v2+0xa8/0x240
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff889ec529>]
:mds:mds_getattr_lock+0xc59/0xce0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff88710ea4>]
:ptlrpc:lustre_msg_add_version+0x34/0x110
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff88602923>] :lnet:lnet_ni_send+0x93/0xd0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff88604d23>] :lnet:lnet_send+0x973/0x9a0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8005c2dc>]
cache_alloc_refill+0x106/0x186
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff889e6fca>]
:mds:fixup_handle_for_resent_req+0x5a/0x2c0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff889f2a76>]
:mds:mds_intent_policy+0x636/0xc10
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff886d36f6>]
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff886d0d46>]
:ptlrpc:ldlm_lock_enqueue+0x186/0xb30
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff886ecacf>]
:ptlrpc:ldlm_export_lock_get+0x6f/0xe0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8864fe48>]
:obdclass:lustre_hash_add+0x218/0x2e0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff886f5530>]
:ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff886f3669>]
:ptlrpc:ldlm_handle_enqueue+0xc19/0x1210
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff889f0630>]
:mds:mds_handle+0x4080/0x4cb0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80148d4f>] __next_cpu+0x19/0x28
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80088f32>]
find_busiest_group+0x20d/0x621
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff88715a15>]
:ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80089d89>] enqueue_task+0x41/0x56
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8871a72d>]
:ptlrpc:ptlrpc_check_req+0x1d/0x110
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8871ce67>]
:ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80063098>] thread_return+0x62/0xfe
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff80088819>] __wake_up_common+0x3e/0x68
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff88720908>]
:ptlrpc:ptlrpc_main+0x1218/0x13e0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8008a3ef>]
default_wake_function+0x0/0xe
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff800b48dd>]
audit_syscall_exit+0x327/0x342
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8871f6f0>]
:ptlrpc:ptlrpc_main+0x0/0x13e0
Oct 5 10:26:43 sosmds2 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Oct 5 10:26:43 sosmds2 kernel:
Oct 5 10:26:43 sosmds2 kernel: LustreError: dumping log to /tmp/lustre-
log.1254731203.30617
--
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75
mail: misch at multinet.de
web: www.multinet.de
Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens
---
PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
More information about the lustre-discuss
mailing list