[Lustre-discuss] BUG - bad scheduling while atomic

Lukas Hejtmanek xhejtman at ics.muni.cz
Thu Sep 3 05:37:17 PDT 2009


Hello,

I'm using Lustre 1.8.1. The server is running patched kernel 2.6.22.19
(vanilla) on Sun Thor server.

The client uses vanilla 2.6.26.6 kernel, patchless, with Lustre 1.8.1 client.

I run the following benchmark:
#!/bin/sh

export rsh="ssh"
export RSH="ssh"

for i in 1 2 4 6 12 18 24 30 36 42 48; do
echo "Clients $i"
./iozone -RTc -C -t $i -r 8k -s16G -e -i0 -i1  -+m hosts.txt 
done


hosts.txt file contains 6 clients in the following pattern:
client1
client2
client3
client4
client5
client6
client1
client2
client3
client4
client5
client6
client1
...
and so on

I got tons of the following messages on the client1. Is there something
I could do to prevent them? Also, I got many error logs on the server. Two of
them are attached.

[191254.328110] BUG: scheduling while atomic: iozone/11568/0x00000002
[191254.327656]  [<ffffffff804ff30b>] __down+0x5b/0x90
[191254.328116] Pid: 11568, comm: iozone Tainted: P          2.6.26.6 #4
[191254.327709] BUG: scheduling while atomic: iozone/11585/0x00000002
[191254.327664]  [<ffffffff8024fda7>] down+0x47/0x50
[191254.328124] 
[191254.328125] Call Trace:
[191254.327717] Pid: 11585, comm: iozone Tainted: P          2.6.26.6 #4
[191254.328131]  [<ffffffff804fe5f1>] thread_return+0x3d3/0x592
[191254.327722] 
[191254.327723] Call Trace:
[191254.327678]  [<ffffffffa0c46cb4>] :lov:lov_putref+0x34/0xfd0
[191254.328139]  [<ffffffff804fffff>] _spin_lock_irqsave+0x1f/0x50
[191254.327731]  [<ffffffff804fe5f1>] thread_return+0x3d3/0x592
[191254.328110]  [<ffffffff8022aa9c>] task_rq_lock+0x4c/0x90
[191254.328147]  [<ffffffff805003d2>] _spin_unlock_irqrestore+0x12/0x40
[191254.327740]  [<ffffffff80228553>] enqueue_task+0x13/0x30
[191254.328119]  [<ffffffff805003d2>] _spin_unlock_irqrestore+0x12/0x40
[191254.328156]  [<ffffffff8022edb7>] hrtick_set+0x77/0x140
[191254.327748]  [<ffffffff805003d2>] _spin_unlock_irqrestore+0x12/0x40
[191254.328162]  [<ffffffff804fe2ad>] thread_return+0x8f/0x592
[191254.328131]  [<ffffffffa0c664c3>] :lov:lov_stripe_number+0x213/0x280
[191254.327757]  [<ffffffff804fea55>] schedule_timeout+0x95/0xd0
[191254.328170]  [<ffffffff804fea55>] schedule_timeout+0x95/0xd0
[191254.327763]  [<ffffffff804ff30b>] __down+0x5b/0x90
[191254.328143]  [<ffffffffa0c5e928>] :lov:lov_get_info+0x148/0x21d0
[191254.328179]  [<ffffffff804ff30b>] __down+0x5b/0x90
[191254.327772]  [<ffffffff8024fda7>] down+0x47/0x50
[191254.328185]  [<ffffffff8024fda7>] down+0x47/0x50
[191254.328154]  [<ffffffffa0c6bad8>] :lov:lov_fini_enqueue_set+0x2a8/0x320
[191254.327781]  [<ffffffffa0c46a20>] :lov:lov_getref+0x20/0x40
[191254.328194]  [<ffffffffa0c46cb4>] :lov:lov_putref+0x34/0xfd0
[191254.328163]  [<ffffffffa0bf9db4>] :osc:loi_list_maint+0x84/0x110
[191254.328200]  [<ffffffff8022aa9c>] task_rq_lock+0x4c/0x90
[191254.328110]  [<ffffffffa0c5e895>] :lov:lov_get_info+0xb5/0x21d0
[191254.328172]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328209]  [<ffffffff805003d2>] _spin_unlock_irqrestore+0x12/0x40
[191254.328178]  [<ffffffffa0c09332>] :osc:osc_trigger_group_io+0x92/0x180
[191254.328122]  [<ffffffffa0c6bad8>] :lov:lov_fini_enqueue_set+0x2a8/0x320
[191254.328218]  [<ffffffffa0c664c3>] :lov:lov_stripe_number+0x213/0x280
[191254.326086]  [<ffffffffa0c664c3>] :lov:lov_stripe_number+0x213/0x280
[191254.328132]  [<ffffffffa0bf9db4>] :osc:loi_list_maint+0x84/0x110
[191254.328136]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328231]  [<ffffffffa0c5e928>] :lov:lov_get_info+0x148/0x21d0
[191254.326086]  [<ffffffffa0c5e928>] :lov:lov_get_info+0x148/0x21d0
[191254.328203]  [<ffffffffa0a4e2c1>] :lvfs:lprocfs_counter_add+0xb1/0x120
[191254.328241]  [<ffffffffa0c6bad8>] :lov:lov_fini_enqueue_set+0x2a8/0x320
[191254.326086]  [<ffffffffa0c6bad8>] :lov:lov_fini_enqueue_set+0x2a8/0x320
[191254.328154]  [<ffffffffa0c09332>] :osc:osc_trigger_group_io+0x92/0x180
[191254.328250]  [<ffffffffa0bf9db4>] :osc:loi_list_maint+0x84/0x110
[191254.328219]  [<ffffffffa0aaf7a5>] :obdclass:oig_init+0xa5/0x2c0
[191254.328256]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328166]  [<ffffffffa0a4e2c1>] :lvfs:lprocfs_counter_add+0xb1/0x120
[191254.326086]  [<ffffffffa0bf9db4>] :osc:loi_list_maint+0x84/0x110
[191254.328265]  [<ffffffffa0c09332>] :osc:osc_trigger_group_io+0x92/0x180
[191254.326086]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328274]  [<ffffffffa0a4e2c1>] :lvfs:lprocfs_counter_add+0xb1/0x120
[191254.326086]  [<ffffffffa0c09332>] :osc:osc_trigger_group_io+0x92/0x180
[191254.328187]  [<ffffffffa0aaf7a5>] :obdclass:oig_init+0xa5/0x2c0
[191254.328248]  [<ffffffffa0cd36e2>] :lustre:ll_readpage+0xd92/0x2060
[191254.326086]  [<ffffffffa0a4e2c1>] :lvfs:lprocfs_counter_add+0xb1/0x120
[191254.328255]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328258]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328202]  [<ffffffffa0cd36e2>] :lustre:ll_readpage+0xd92/0x2060
[191254.328206]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.326086]  [<ffffffffa0aaf7a5>] :obdclass:oig_init+0xa5/0x2c0
[191254.328219]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328308]  [<ffffffffa0aaf7a5>] :obdclass:oig_init+0xa5/0x2c0
[191254.328283]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.328287]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328323]  [<ffffffffa0cd36e2>] :lustre:ll_readpage+0xd92/0x2060
[191254.326086]  [<ffffffffa0cd36e2>] :lustre:ll_readpage+0xd92/0x2060
[191254.328244]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.328332]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.326086]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328252]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328341]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.326086]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328273]  [<ffffffffa0b1ece1>]
:ptlrpc:ldlm_lock_decref_internal+0x2c1/0x870
[191254.328362]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.326086]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.328334]  [<ffffffffa0b1ece1>]
:ptlrpc:ldlm_lock_decref_internal+0x2c1/0x870
[191254.326086]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328376]  [<ffffffff80500320>] _spin_unlock+0x10/0x30
[191254.328296]  [<ffffffffa0b3bf10>] :ptlrpc:ldlm_completion_ast+0x0/0x8d0
[191254.328356]  [<ffffffffa0b3bf10>] :ptlrpc:ldlm_completion_ast+0x0/0x8d0
[191254.328307]  [<ffffffffa0caaf70>] :lustre:ll_glimpse_callback+0x0/0x450
[191254.326086]  [<ffffffffa0b1ece1>]
:ptlrpc:ldlm_lock_decref_internal+0x2c1/0x870
[191254.328399]  [<ffffffffa0b1ece1>]
:ptlrpc:ldlm_lock_decref_internal+0x2c1/0x870
[191254.328376]  [<ffffffffa0caaf70>] :lustre:ll_glimpse_callback+0x0/0x450
[191254.328334]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.326086]  [<ffffffffa0b3bf10>] :ptlrpc:ldlm_completion_ast+0x0/0x8d0
[191254.328424]  [<ffffffffa0b3bf10>] :ptlrpc:ldlm_completion_ast+0x0/0x8d0
[191254.326086]  [<ffffffffa0caaf70>] :lustre:ll_glimpse_callback+0x0/0x450
[191254.328350]  [<ffffffff8027a3ef>] generic_file_aio_read+0x19f/0x570
[191254.328437]  [<ffffffffa0caaf70>] :lustre:ll_glimpse_callback+0x0/0x450
[191254.330047]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.328360]  [<ffffffff8023ae9e>] current_fs_time+0x1e/0x30
[191254.330047]  [<ffffffff8027a3ef>] generic_file_aio_read+0x19f/0x570
[191254.330047]  [<ffffffff8023ae9e>] current_fs_time+0x1e/0x30
[191254.328373]  [<ffffffffa0ca5510>] :lustre:ll_file_aio_read+0x9f0/0x1f70
[191254.328461]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.328380]  [<ffffffff8022b80b>] hrtick_start_fair+0xeb/0x170
[191254.330047]  [<ffffffffa0ca5510>] :lustre:ll_file_aio_read+0x9f0/0x1f70
[191254.328387]  [<ffffffff804fffff>] _spin_lock_irqsave+0x1f/0x50
[191254.326086]  [<ffffffffa0b1a05e>] :ptlrpc:unlock_res_and_lock+0x5e/0xe0
[191254.330047]  [<ffffffff8022b80b>] hrtick_start_fair+0xeb/0x170
[191254.328396]  [<ffffffff8024f0b0>] ktime_get_ts+0x20/0x60
[191254.326086]  [<ffffffff8027a3ef>] generic_file_aio_read+0x19f/0x570
[191254.330047]  [<ffffffff804fffff>] _spin_lock_irqsave+0x1f/0x50
[191254.328405]  [<ffffffff8024f0fc>] ktime_get+0xc/0x50
[191254.326086]  [<ffffffff8023ae9e>] current_fs_time+0x1e/0x30
[191254.330047]  [<ffffffff803722ee>] rb_insert_color+0xde/0x110
[191254.330047]  [<ffffffff805003d2>] _spin_unlock_irqrestore+0x12/0x40
[191254.328502]  [<ffffffff8027a3ef>] generic_file_aio_read+0x19f/0x570
[191254.328422]  [<ffffffffa0caab59>] :lustre:ll_file_read+0xb9/0xd0
[191254.326086]  [<ffffffffa0ca5510>] :lustre:ll_file_aio_read+0x9f0/0x1f70
[191254.328511]  [<ffffffff8023ae9e>] current_fs_time+0x1e/0x30
[191254.328430]  [<ffffffff804fe2ad>] thread_return+0x8f/0x592
[191254.326086]  [<ffffffff8022b80b>] hrtick_start_fair+0xeb/0x170
[191254.330047]  [<ffffffff8024b8c0>] autoremove_wake_function+0x0/0x30
[191254.326086]  [<ffffffff804fffff>] _spin_lock_irqsave+0x1f/0x50
[191254.330053]  [<ffffffff804fe2ad>] thread_return+0x8f/0x592
[191254.330047]  [<ffffffffa0ca5510>] :lustre:ll_file_aio_read+0x9f0/0x1f70
[191254.326086]  [<ffffffff805003d2>] _spin_unlock_irqrestore+0x12/0x40
[191254.330062]  [<ffffffff802abd2f>] rw_verify_area+0x6f/0xd0
[191254.330056]  [<ffffffff8022edd0>] hrtick_set+0x90/0x140
[191254.330068]  [<ffffffff802ac625>] vfs_read+0xc5/0x180
[191254.326086]  [<ffffffff8020ca56>] retint_kernel+0x26/0x30
[191254.330073]  [<ffffffff802acb23>] sys_read+0x53/0x90
[191254.330068]  [<ffffffff80212b79>] read_tsc+0x9/0x20
[191254.330080]  [<ffffffff8020c40b>] system_call_after_swapgs+0x7b/0x80
[191254.330074]  [<ffffffff802517e9>] getnstimeofday+0x39/0xc0
[191254.330047]  [<ffffffffa0caab59>] :lustre:ll_file_read+0xb9/0xd0
[191254.330088] 
[191254.330081]  [<ffffffff8024f0fc>] ktime_get+0xc/0x50
[191254.330047]  [<ffffffff804fe2ad>] thread_return+0x8f/0x592
[191254.330047]  [<ffffffff8024b8c0>] autoremove_wake_function+0x0/0x30
[191254.326086]  [<ffffffff8020b587>] do_notify_resume+0x7/0x910
[191254.330099]  [<ffffffffa0caab59>] :lustre:ll_file_read+0xb9/0xd0
[191254.330105]  [<ffffffff8024b8c0>] autoremove_wake_function+0x0/0x30
[191254.330047]  [<ffffffff802abd2f>] rw_verify_area+0x6f/0xd0
[191254.326086]  [<ffffffffa0caab59>] :lustre:ll_file_read+0xb9/0xd0
[191254.330047]  [<ffffffff802ac625>] vfs_read+0xc5/0x180
[191254.326086]  [<ffffffff8024f0fc>] ktime_get+0xc/0x50
[191254.330047]  [<ffffffff802acb23>] sys_read+0x53/0x90
[191254.326086]  [<ffffffff8024b8c0>] autoremove_wake_function+0x0/0x30
[191254.330047]  [<ffffffff8020c40b>] system_call_after_swapgs+0x7b/0x80
[191254.326086]  [<ffffffff802abd2f>] rw_verify_area+0x6f/0xd0
[191254.330047] 
[191254.330132]  [<ffffffff802abd2f>] rw_verify_area+0x6f/0xd0
[191254.330136]  [<ffffffff802ac625>] vfs_read+0xc5/0x180
[191254.326086]  [<ffffffff802ac625>] vfs_read+0xc5/0x180
[191254.326086]  [<ffffffff802acb23>] sys_read+0x53/0x90
[191254.330146]  [<ffffffff802acb23>] sys_read+0x53/0x90
[191254.330150]  [<ffffffff8020c40b>] system_call_after_swapgs+0x7b/0x80
[191254.326086]  [<ffffffff8020c40b>] system_call_after_swapgs+0x7b/0x80
[191254.326086] 
[191254.330160] 


-- 
Lukáš Hejtmánek
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-log.1251981100.4265
Type: application/octet-stream
Size: 190 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090903/6716a617/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-log.1251981100.4286
Type: application/octet-stream
Size: 190 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090903/6716a617/attachment-0001.obj>


More information about the lustre-discuss mailing list