[Lustre-discuss] Kernel BUG causing kernel panic kernel 2.6.9-55.0.9.EL_lustre.1.6.3smp
Wojciech Turek
wjt27 at cam.ac.uk
Mon Nov 19 08:26:37 PST 2007
Dear All,
We are experiencing frequent OSS crashes.
We have 4 OSS's and each OSS serves 6 OST's to 600 clients. We
observe random OSS crashes every 1-2 days. See below console output
captured during crash.
Does is looks for some of you familiar? We have seen the same crashes
with lustre 1.6.2
Nov 18 15:17:21 storage08 heartbeat: [25566]: info: Checking status
of STONITH
Nov 18 15:17:21 storage08 heartbeat: [24250]: info: Exiting STONITH-
stat process
Kernel BUG at mballoc:3352
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U)
ldiskfs(U) lustre(U) lov(U) mdc(U) lquota(U) ptlrpc(U) obdclass(U)
lvfs(U) sg(U) ksocklnd(U) lnet(U) libcfs(U) cxgb3(U) ipmi_si(U)
ipmi_devintf(U) ipmi_msghandler(U) md5(U) ipv6(U) autofs4(U)
i2c_nforce2(U) i2c_amd756(U) i2c_isa(U) i2c_amd8111(U) i2c_i801(U)
i2c_core(U) mptctl(U) dm_mirror(U) dm_round_robin(U) dm_multipath(U)
dm_mod(U) sr_mod(U) usb_storage(U) joydev(U) button(U) battery(U) ac
(U) uhci_hcd(U) ehci_hcd(U) hw_random(U) qla2400(U) qla2xxx(U)
scsi_transport_fc(U) ata_piix(U) ext3(U) jbd(U) xfs(U) tg3(U) s2io(U)
nfs(U) nfs_acl(U) lockd(U) sunrpc(U) mptsas(U) mptscsi(U) mptbase(U)
megaraid_sas(U) e1000(U) bnx2(U) sd_mod(U)
Pid: 9070, comm: ll_ost_io_151 Tainted: GF 2.6.9-55.0.9.EL_lustre.
1.6.3smp
RIP: 0010:[<ffffffffa05e2923>] <ffffffffa05e2923>
{:ldiskfs:ldiskfs_mb_generate_from_pa+179}
RSP: 0018:00000100c9721268 EFLAGS: 00010297
RAX: 0000000000002177 RBX: 0000000000000000 RCX: 00000100c9721288
RDX: 0000000000000000 RSI: 0000000000002178 RDI: 0000010077ce42b0
RBP: 0000010077ce4290 R08: 00000100c9721280 R09: 01ff80000007c008
R10: 0000080000000000 R11: ffffffffffffffff R12: 0000010077ce42b0
R13: 000001007fb09000 R14: 0000000000000000 R15: 00000100ad763c28
FS: 0000002a95565b00(0000) GS:ffffffff804a6700(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a984c80e8 CR3: 0000000000101000 CR4: 00000000000006e0
Process ll_ost_io_151 (pid: 9070, threadinfo 00000100c9720000, task
00000100c96f1800)
Stack: 0000000000001000 0000000000002177 00000100b5196400
0000000000002178
0000000000000000 0000000000002177 00000100a56d6ee0
0000000000000000
0000000000002177 0000000000000000
Call Trace:<ffffffffa05e310a>{:ldiskfs:ldiskfs_mb_init_cache+1898}
<ffffffffa05e3340>{:ldiskfs:ldiskfs_mb_load_buddy+304}
<ffffffffa05e96e2>{:ldiskfs:ldiskfs_mb_free_blocks+626}
<ffffffffa0180920>{:jbd:journal_get_write_access+48}
<ffffffff801589d9>{find_get_page+65} <ffffffff801798e7>
{__find_get_block_slow+62}
<ffffffff8017a097>{__find_get_block+162} <ffffffffa0180920>
{:jbd:journal_get_write_access+48}
<ffffffffa05c9933>{:ldiskfs:ldiskfs_free_blocks+163}
<ffffffffa05e165a>{:ldiskfs:ldiskfs_remove_blocks+282}
<ffffffffa05e0ff4>{:ldiskfs:ldiskfs_ext_remove_space+1508}
<ffffffffa05ce27c>{:ldiskfs:ldiskfs_mark_inode_dirty+76}
<ffffffffa05e1f80>{:ldiskfs:ldiskfs_ext_truncate+368}
<ffffffffa05cfcb5>{:ldiskfs:ldiskfs_truncate+309}
<ffffffff80167df9>{unmap_mapping_range+339}
<ffffffffa05ce11a>{:ldiskfs:ldiskfs_mark_iloc_dirty+1034}
<ffffffff80167ea4>{vmtruncate+162} <ffffffff80191c88>
{inode_setattr+41}
<ffffffffa05cf5bc>{:ldiskfs:ldiskfs_setattr+444}
<ffffffffa062ae72>{:fsfilt_ldiskfs:fsfilt_ldiskfs_setattr+386}
<ffffffffa064af7b>{:obdfilter:filter_destroy+3131}
<ffffffffa0456da0>{:ptlrpc:ldlm_completion_ast+0}
<ffffffff802f069d>{tcp_rcv_established+2099}
<ffffffffa047bd83>{:ptlrpc:lustre_msg_add_version+83}
<ffffffffa047d205>{:ptlrpc:lustre_msg_check_version+69}
<ffffffffa061a25d>{:ost:ost_handle+6397} <ffffffff802dfc76>
{ip_rcv+1046}
<ffffffff802c6861>{netif_receive_skb+791} <ffffffffa031a9ba>
{:cxgb3:lro_flush_session+154}
<ffffffffa035fb58>{:lnet:lnet_match_blocked_msg+920}
<ffffffffa0485b4c>{:ptlrpc:ptlrpc_server_handle_request+3036}
<ffffffffa033cbae>{:libcfs:lcw_update_time+30}
<ffffffff8013f448>{__mod_timer+293}
<ffffffffa04881d8>{:ptlrpc:ptlrpc_main+2504}
<ffffffff80133566>{default_wake_function+0}
<ffffffffa0486860>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa0486860>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110de3>{child_rip+8} <ffffffffa0487810>
{:ptlrpc:ptlrpc_main+0}
<ffffffff80110ddb>{child_rip+0}
Code: 0f 0b d2 bb 5e a0 ff ff ff ff 18 0d 90 8b 4c 24 20 8d 34 0b
RIP <ffffffffa05e2923>{:ldiskfs:ldiskfs_mb_generate_from_pa+179} RSP
<00000100c9721268>
<0>Kernel panic - not syncing: Oops
Best regards
Wojciech Turek
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071119/59df571e/attachment.htm>
More information about the lustre-discuss
mailing list