Yes this is a Redhat 6.1 xen kernel, it works with this warning message, but well, I found under /var/spool/abrt/kerneloops-1308854953-1114-9 directory, there is a file called backtrace: <div><br></div><div><div>[root@ip-10-83-7-78 kerneloops-1308854953-1114-9]# cat backtrace</div>

<div>WARNING: at kernel/sched.c:7087 __cond_resched_lock+0x8e/0xb0() (Tainted: G        W  ----------------  )</div><div>Modules linked in: ldiskfs(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mod [last unloaded: nf_defrag_ipv4]</div>

<div>Pid: 3778, comm: mount.lustre Tainted: G        W  ----------------   2.6.32.lustre21 #6</div><div>Call Trace:</div><div>[<ffffffff81069c37>] ? warn_slowpath_common+0x87/0xc0</div><div>[<ffffffff81069c8a>] ? warn_slowpath_null+0x1a/0x20</div>

<div>[<ffffffff810654fe>] ? __cond_resched_lock+0x8e/0xb0</div><div>[<ffffffff811a82f9>] ? invalidate_inodes+0xc9/0x180</div><div>[<ffffffff8118f548>] ? generic_shutdown_super+0x68/0xe0</div><div>[<ffffffff8118f5f1>] ? kill_block_super+0x31/0x50</div>

<div>[<ffffffff811906b5>] ? deactivate_super+0x85/0xa0</div><div>[<ffffffff811ac5af>] ? mntput_no_expire+0xbf/0x110</div><div>[<ffffffffa0231f8e>] ? unlock_mntput+0x3e/0x60 [obdclass]</div><div>[<ffffffffa0235a98>] ? server_kernel_mount+0x268/0xe80 [obdclass]</div>

<div>[<ffffffffa023ed40>] ? lustre_fill_super+0x0/0x1290 [obdclass]</div><div>[<ffffffffa0237070>] ? lustre_init_lsi+0xd0/0x5b0 [obdclass]</div><div>[<ffffffff810ac71d>] ? lock_release+0xed/0x220</div><div>

[<ffffffffa023efd0>] ? lustre_fill_super+0x290/0x1290 [obdclass]</div><div>[<ffffffff8118ee20>] ? set_anon_super+0x0/0x110</div><div>[<ffffffffa023ed40>] ? lustre_fill_super+0x0/0x1290 [obdclass]</div><div>

[<ffffffff8119035f>] ? get_sb_nodev+0x5f/0xa0</div><div>[<ffffffffa0230885>] ? lustre_get_sb+0x25/0x30 [obdclass]</div><div>[<ffffffff8118ffbb>] ? vfs_kern_mount+0x7b/0x1b0</div><div>[<ffffffff81190162>] ? do_kern_mount+0x52/0x130</div>

<div>[<ffffffff811ae647>] ? do_mount+0x2e7/0x870</div><div>[<ffffffff811aec60>] ? sys_mount+0x90/0xe0</div><div>[<ffffffff8100b132>] ? system_call_fastpath+0x16/0x1b</div><div><br></div><div>You are right kernel is still running but, this kerneloops-xxx directory name makes me think it's a  crash, maybe it's recoverable. Any idea? </div>

<div><br></div><div>-Jon.</div><div>

<br><br><div class="gmail_quote">On Fri, Jun 24, 2011 at 2:43 PM, Oleg Drokin <span dir="ltr"><<a href="mailto:green@whamcloud.com">green@whamcloud.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hwllo~<br>

<div class="im"><br>

On Jun 23, 2011, at 9:51 PM, Jon Zhu wrote:<br>

<br>

> I still got some crash when further run some I/O test with the build, here's some system message containing call stack info maybe be useful to you to find the bug:<br>

<br>

</div><div><div></div><div class="h5">> Jun 23 21:46:12 ip-10-112-59-173 kernel: ------------[ cut here ]------------<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: WARNING: at kernel/sched.c:7087 __cond_resched_lock+0x8e/0xb0() (Not tainted)<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: Modules linked in: lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) ldiskfs(U) sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt autofs4 ipv6 microcode xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mod [last unloaded: scsi_wait_scan]<br>


> Jun 23 21:46:12 ip-10-112-59-173 kernel: Pid: 1421, comm: mount.lustre Not tainted 2.6.32.lustre21 #6<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: Call Trace:<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81069c37>] ? warn_slowpath_common+0x87/0xc0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81007671>] ? __raw_callee_save_xen_save_fl+0x11/0x1e<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81069c8a>] ? warn_slowpath_null+0x1a/0x20<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff810654fe>] ? __cond_resched_lock+0x8e/0xb0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811a53b7>] ? shrink_dcache_for_umount_subtree+0x187/0x340<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811a55a6>] ? shrink_dcache_for_umount+0x36/0x60<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118f4ff>] ? generic_shutdown_super+0x1f/0xe0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118f5f1>] ? kill_block_super+0x31/0x50<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811906b5>] ? deactivate_super+0x85/0xa0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811ac5af>] ? mntput_no_expire+0xbf/0x110<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0273f8e>] ? unlock_mntput+0x3e/0x60 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0277a98>] ? server_kernel_mount+0x268/0xe80 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0280d40>] ? lustre_fill_super+0x0/0x1290 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0279070>] ? lustre_init_lsi+0xd0/0x5b0 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff810ac71d>] ? lock_release+0xed/0x220<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0280fd0>] ? lustre_fill_super+0x290/0x1290 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118ee20>] ? set_anon_super+0x0/0x110<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0280d40>] ? lustre_fill_super+0x0/0x1290 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8119035f>] ? get_sb_nodev+0x5f/0xa0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0272885>] ? lustre_get_sb+0x25/0x30 [obdclass]<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118ffbb>] ? vfs_kern_mount+0x7b/0x1b0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81190162>] ? do_kern_mount+0x52/0x130<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811ae647>] ? do_mount+0x2e7/0x870<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811aec60>] ? sys_mount+0x90/0xe0<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8100b132>] ? system_call_fastpath+0x16/0x1b<br>

> Jun 23 21:46:12 ip-10-112-59-173 kernel: ---[ end trace a8fb737c71bfba13 ]---<br>

<br>

</div></div>This is not a crash, it's just a warning about scheduling in inappropriate context I guess, but the kernel will continue to work.<br>

Interesting that I have never seen anything like that in rhel5 xen kernels, perhaps it's something with rhel6.1 xen?<br>

<br>

Bye,<br>

    Oleg<br>

<font color="#888888">--<br>

Oleg Drokin<br>

Senior Software Engineer<br>

Whamcloud, Inc.<br>

<br>

</font></blockquote></div><br></div></div>