[Lustre-discuss] Lustre v2.1 RHEL 6.1 build does not work

Fri Jun 24 13:00:23 PDT 2011

Yes this is a Redhat 6.1 xen kernel, it works with this warning message, but
well, I found under /var/spool/abrt/kerneloops-1308854953-1114-9 directory,
there is a file called backtrace:

[root at ip-10-83-7-78 kerneloops-1308854953-1114-9]# cat backtrace
WARNING: at kernel/sched.c:7087 __cond_resched_lock+0x8e/0xb0() (Tainted: G
       W  ----------------  )
Modules linked in: ldiskfs(U) lustre(U) lov(U) osc(U) lquota(U) mdc(U)
fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U)
autofs4 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack ip6table_filter ip6_tables ipv6 microcode xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mod [last unloaded: nf_defrag_ipv4]
Pid: 3778, comm: mount.lustre Tainted: G        W  ----------------
2.6.32.lustre21 #6
Call Trace:
[<ffffffff81069c37>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff81069c8a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff810654fe>] ? __cond_resched_lock+0x8e/0xb0
[<ffffffff811a82f9>] ? invalidate_inodes+0xc9/0x180
[<ffffffff8118f548>] ? generic_shutdown_super+0x68/0xe0
[<ffffffff8118f5f1>] ? kill_block_super+0x31/0x50
[<ffffffff811906b5>] ? deactivate_super+0x85/0xa0
[<ffffffff811ac5af>] ? mntput_no_expire+0xbf/0x110
[<ffffffffa0231f8e>] ? unlock_mntput+0x3e/0x60 [obdclass]
[<ffffffffa0235a98>] ? server_kernel_mount+0x268/0xe80 [obdclass]
[<ffffffffa023ed40>] ? lustre_fill_super+0x0/0x1290 [obdclass]
[<ffffffffa0237070>] ? lustre_init_lsi+0xd0/0x5b0 [obdclass]
[<ffffffff810ac71d>] ? lock_release+0xed/0x220
[<ffffffffa023efd0>] ? lustre_fill_super+0x290/0x1290 [obdclass]
[<ffffffff8118ee20>] ? set_anon_super+0x0/0x110
[<ffffffffa023ed40>] ? lustre_fill_super+0x0/0x1290 [obdclass]
[<ffffffff8119035f>] ? get_sb_nodev+0x5f/0xa0
[<ffffffffa0230885>] ? lustre_get_sb+0x25/0x30 [obdclass]
[<ffffffff8118ffbb>] ? vfs_kern_mount+0x7b/0x1b0
[<ffffffff81190162>] ? do_kern_mount+0x52/0x130
[<ffffffff811ae647>] ? do_mount+0x2e7/0x870
[<ffffffff811aec60>] ? sys_mount+0x90/0xe0
[<ffffffff8100b132>] ? system_call_fastpath+0x16/0x1b

You are right kernel is still running but, this kerneloops-xxx directory
name makes me think it's a  crash, maybe it's recoverable. Any idea?

-Jon.

On Fri, Jun 24, 2011 at 2:43 PM, Oleg Drokin <green at whamcloud.com> wrote:

> Hwllo~
>
> On Jun 23, 2011, at 9:51 PM, Jon Zhu wrote:
>
> > I still got some crash when further run some I/O test with the build,
> here's some system message containing call stack info maybe be useful to you
> to find the bug:
>
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: ------------[ cut here
> ]------------
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: WARNING: at kernel/sched.c:7087
> __cond_resched_lock+0x8e/0xb0() (Not tainted)
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: Modules linked in: lustre(U)
> lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U)
> obdclass(U) lnet(U) lvfs(U) libcfs(U) ldiskfs(U) sha256_generic cryptd
> aes_x86_64 aes_generic cbc dm_crypt autofs4 ipv6 microcode xen_netfront ext4
> mbcache jbd2 xen_blkfront dm_mod [last unloaded: scsi_wait_scan]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: Pid: 1421, comm: mount.lustre
> Not tainted 2.6.32.lustre21 #6
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: Call Trace:
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81069c37>] ?
> warn_slowpath_common+0x87/0xc0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81007671>] ?
> __raw_callee_save_xen_save_fl+0x11/0x1e
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81069c8a>] ?
> warn_slowpath_null+0x1a/0x20
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff810654fe>] ?
> __cond_resched_lock+0x8e/0xb0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811a53b7>] ?
> shrink_dcache_for_umount_subtree+0x187/0x340
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811a55a6>] ?
> shrink_dcache_for_umount+0x36/0x60
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118f4ff>] ?
> generic_shutdown_super+0x1f/0xe0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118f5f1>] ?
> kill_block_super+0x31/0x50
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811906b5>] ?
> deactivate_super+0x85/0xa0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811ac5af>] ?
> mntput_no_expire+0xbf/0x110
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0273f8e>] ?
> unlock_mntput+0x3e/0x60 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0277a98>] ?
> server_kernel_mount+0x268/0xe80 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0280d40>] ?
> lustre_fill_super+0x0/0x1290 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0279070>] ?
> lustre_init_lsi+0xd0/0x5b0 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff810ac71d>] ?
> lock_release+0xed/0x220
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0280fd0>] ?
> lustre_fill_super+0x290/0x1290 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118ee20>] ?
> set_anon_super+0x0/0x110
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0280d40>] ?
> lustre_fill_super+0x0/0x1290 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8119035f>] ?
> get_sb_nodev+0x5f/0xa0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffffa0272885>] ?
> lustre_get_sb+0x25/0x30 [obdclass]
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8118ffbb>] ?
> vfs_kern_mount+0x7b/0x1b0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff81190162>] ?
> do_kern_mount+0x52/0x130
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811ae647>] ?
> do_mount+0x2e7/0x870
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff811aec60>] ?
> sys_mount+0x90/0xe0
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: [<ffffffff8100b132>] ?
> system_call_fastpath+0x16/0x1b
> > Jun 23 21:46:12 ip-10-112-59-173 kernel: ---[ end trace a8fb737c71bfba13
> ]---
>
> This is not a crash, it's just a warning about scheduling in inappropriate
> context I guess, but the kernel will continue to work.
> Interesting that I have never seen anything like that in rhel5 xen kernels,
> perhaps it's something with rhel6.1 xen?
>
> Bye,
>    Oleg
> --
> Oleg Drokin
> Senior Software Engineer
> Whamcloud, Inc.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110624/40a18296/attachment.htm>