[Lustre-discuss] Question about sleeping processes

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Oct 6 07:22:08 PDT 2009


On Tue, 2009-10-06 at 12:48 +0200, Michael Schwartzkopff wrote:
> Hi,

Hi,

> my system load shows that quite a number of processes are waiting.

Blocked.  I guess the word waiting is similar.

> My questions are:
> What causes the problem?

In this case, the thread has lbugged previously.

If you look in syslog for node with these processes you should find
entries with LBUG and/or ASSERTION messages.  These are the defects that
are causing the processes to get blocked (uninteruptable sleep)

> Can I kill the "hanging" processes?

Nope.  You have to reboot the node.

Please search bugzilla for the LBUG/ASSERTIONs you are getting and if
you don't find anything that matches, please file a new bug.

> Oct  5 10:28:03 sosmds2 kernel: Lustre: 0:0:(watchdog.c:181:lcw_cb()) Watchdog 
> triggered for pid 28402: it was inactive for 200.00s
> Oct  5 10:28:03 sosmds2 kernel: ll_mdt_35     D ffff81000100c980     0 28402      
> 1         28403 28388 (L-TLB)
> Oct  5 10:28:03 sosmds2 kernel:  ffff81041c723810 0000000000000046 
> 0000000000000000 7fffffffffffffff
> Oct  5 10:28:03 sosmds2 kernel:  ffff81041c7237d0 0000000000000001 
> ffff81022f3e60c0 ffff81022f12e080
> Oct  5 10:28:03 sosmds2 kernel:  000177b2feff847c 00000000000014df 
> ffff81022f3e62a8 000000010000028f
> Oct  5 10:28:03 sosmds2 kernel: Call Trace:
> Oct  5 10:28:03 sosmds2 kernel:  [<ffffffff8008a3ef>] 
> default_wake_function+0x0/0xe
> Oct  5 10:28:03 sosmds2 kernel:  [<ffffffff885b1b26>] 
> :libcfs:lbug_with_loc+0xc6/0xd0

Here's where you can see that the thread has lbugged.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091006/38864093/attachment.pgp>


More information about the lustre-discuss mailing list