[lustre-discuss] Lustre OSS kernel panic after mounting OSTs
Riccardo Veraldi
Riccardo.Veraldi at cnaf.infn.it
Tue Oct 30 05:43:01 PDT 2018
thank you Fernando for the hint, I did it right now thanks. I am
running e2fsck again.
Anyway my problem was this:
https://jira.whamcloud.com/browse/LU-5040
thank you
On 10/30/18 5:28 AM, Fernando Perez wrote:
> Dear Riccardo.
>
> Have you tried to upgrade e2fsprogs packages before perform the e2fsck?
>
> Regards.
>
> =============================================
> Fernando Pérez
> Institut de Ciències del Mar (CSIC)
> Departament Oceanografía Física i Tecnològica
> Passeig Marítim de la Barceloneta,37-49
> 08003 Barcelona
> Phone: (+34) 93 230 96 35
> =============================================
>
> On 10/30/2018 01:05 PM, Riccardo Veraldi wrote:
>> Hello,
>>
>> I have quite a very critical problem.
>>
>> One of my OSSes hanfs into a kernel panic when trying to mount the OSTs.
>>
>> After mounting 11 OSTs over 12 total OSTs it goes into kernel panic.
>> Does not matter hte order in which they are mounted.
>>
>> Any clue on hints ?
>>
>> I cannot really recover it and I have important data on it.
>>
>> I already performed an e2fsck. Anyway it did not fix. it has found a
>> few inode count inconsistencies before.
>>
>> kernel is 2.6.32-431.23.3.el6_lustre.x86_64
>>
>> Red Hat Enterprise Linux Server release 6.7 (Santiago)
>>
>> lustre-2.5.3-2.6.32_431.23.3.el6_lustre.x86_64.x86_64
>>
>>
>> Oct 30 04:58:52 psanaoss231 kernel: INFO: task tgt_recov:4569 blocked
>> for more than 120 seconds.
>>
>> Oct 30 04:58:52 psanaoss231 kernel: Not tainted
>> 2.6.32-431.23.3.el6_lustre.x86_64 #1
>> Oct 30 04:58:52 psanaoss231 kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 30 04:58:52 psanaoss231 kernel: tgt_recov D
>> 0000000000000003 0 4569 2 0x00000080
>> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1da0 0000000000000046
>> 0000000000000000 0000000000000003
>> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1d30 ffffffff81059096
>> ffff880bf2ae1d40 ffff880bf2a1d500
>> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2b01ab8 ffff880bf2ae1fd8
>> 000000000000fbc8 ffff880bf2b01ab8
>> Oct 30 04:58:52 psanaoss231 kernel: Call Trace:
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81059096>] ?
>> enqueue_task+0x66/0x80
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae560>] ?
>> check_for_clients+0x0/0x70 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07afbcd>]
>> target_recovery_overseer+0x9d/0x230 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae250>] ?
>> exp_connect_healthy+0x0/0x20 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109afa0>] ?
>> autoremove_wake_function+0x0/0x40
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ?
>> target_recovery_thread+0x0/0x1920 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b69d0>]
>> target_recovery_thread+0x540/0x1920 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81061d12>] ?
>> default_wake_function+0x12/0x20
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ?
>> target_recovery_thread+0x0/0x1920 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109abf6>]
>> kthread+0x96/0xa0
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c20a>]
>> child_rip+0xa/0x20
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109ab60>] ?
>> kthread+0x0/0xa0
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c200>] ?
>> child_rip+0x0/0x20
>> Oct 30 04:59:02 psanaoss231 kernel: Lustre: ana13-OST0004: Recovery
>> over after 3:05, of 147 clients 146 recovered and 1 was evicted.
>> Oct 30 04:59:03 psanaoss231 kernel: Lustre: ana13-OST0004: Client
>> 89ba817f-45c3-5e64-99a8-b472651bbe45 (at 172.21.52.213 at o2ib)
>> reconnecting
>> Oct 30 04:59:03 psanaoss231 kernel: Lustre: Skipped 94 previous
>> similar messages
>> Oct 30 04:59:21 psanaoss231 kernel: LustreError:
>> 4569:0:(ost_handler.c:1123:ost_brw_write()) Dropping timed-out write
>> from 12345-172.21.49.129 at tcp because locking object 0x0:14198730 took
>> 153 seconds (limit was 30).
>> Oct 30 04:59:21 psanaoss231 kernel: Lustre: ana13-OST0005: Bulk IO
>> write error with 3a71df2f-16e7-d507-2495-ab60364d8e7c (at
>> 172.21.49.129 at tcp), client will retry: rc -110
>> Oct 30 04:59:52 psanaoss231 kernel: ------------[ cut here ]------------
>> Oct 30 04:59:52 psanaoss231 kernel: kernel BUG at
>> fs/jbd2/transaction.c:1033!
>> Oct 30 04:59:52 psanaoss231 kernel: invalid opcode: 0000 [#1] SMP
>> Oct 30 04:59:52 psanaoss231 kernel: last sysfs file:
>> /sys/devices/system/cpu/online
>> Oct 30 04:59:52 psanaoss231 kernel: CPU 10
>> Oct 30 04:59:52 psanaoss231 kernel: Modules linked in: osp(U) ofd(U)
>> lfsck(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U)
>> ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U)
>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic
>> sha256_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss
>> nfs_acl mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase
>> autofs4 sunrpc ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4
>> nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm
>> ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode
>> power_meter iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf sb_edac
>> edac_core lpc_ich mfd_core shpchp igb i2c_algo_bit i2c_core ses
>> enclosure sg ixgbe dca ptp pps_core mdio ext4 jbd2 mbcache raid1
>> sd_mod crc_t10dif ahci wmi mlx4_ib ib_sa ib_mad ib_core mlx4_en
>> mlx4_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last
>> unloaded: speedstep_lib]
>> Oct 30 04:59:52 psanaoss231 kernel:
>> Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 Not
>> tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 Dell Inc. PowerEdge
>> R620/0PXXHP
>> Oct 30 04:59:52 psanaoss231 kernel: RIP: 0010:[<ffffffffa01198ad>]
>> [<ffffffffa01198ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
>> Oct 30 04:59:52 psanaoss231 kernel: RSP: 0018:ffff880c058437d0
>> EFLAGS: 00010246
>> Oct 30 04:59:52 psanaoss231 kernel: RAX: ffff880c05573dc0 RBX:
>> ffff880c043b8d08 RCX: ffff88175b0fedc8
>> Oct 30 04:59:52 psanaoss231 kernel: RDX: 0000000000000000 RSI:
>> ffff88175b0fedc8 RDI: 0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: RBP: ffff880c058437f0 R08:
>> 9010000000000000 R09: e886f5e8fbf37202
>> Oct 30 04:59:52 psanaoss231 kernel: R10: 0000000000000002 R11:
>> 0000000000000000 R12: ffff880c040c26d8
>> Oct 30 04:59:52 psanaoss231 kernel: R13: ffff88175b0fedc8 R14:
>> ffff88174728c800 R15: 0000000000000008
>> Oct 30 04:59:52 psanaoss231 kernel: FS: 0000000000000000(0000)
>> GS:ffff8800282a0000(0000) knlGS:0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
>> 000000008005003b
>> Oct 30 04:59:52 psanaoss231 kernel: CR2: 00000034f304b750 CR3:
>> 0000000001a85000 CR4: 00000000000407e0
>> Oct 30 04:59:52 psanaoss231 kernel: DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: DR3: 0000000000000000 DR6:
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Oct 30 04:59:52 psanaoss231 kernel: Process ll_ost01_007 (pid: 4272,
>> threadinfo ffff880c05842000, task ffff880c0634eaa0)
>> Oct 30 04:59:52 psanaoss231 kernel: Stack:
>> Oct 30 04:59:52 psanaoss231 kernel: ffff880c043b8d08 ffffffffa0d136f0
>> ffff88175b0fedc8 0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: <d> ffff880c05843830
>> ffffffffa0cd100b ffff880c05843820 ffffffff8109af8f
>> Oct 30 04:59:52 psanaoss231 kernel: <d> ffff88175b105a40
>> ffff880c043b8d08 0000000000000018 ffff88175b0fedc8
>> Oct 30 04:59:52 psanaoss231 kernel: Call Trace:
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0cd100b>]
>> __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109af8f>] ?
>> wake_up_bit+0x2f/0x40
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d067c5>]
>> ldiskfs_quota_write+0x165/0x210 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eef11>]
>> v2_write_file_info+0xa1/0xe0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eb018>]
>> dquot_acquire+0x138/0x140
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05956>]
>> ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ecf8c>]
>> dqget+0x2ac/0x390
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ed51b>]
>> dquot_initialize+0x7b/0x240
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8116f553>] ?
>> kmem_cache_alloc_trace+0x1a3/0x1b0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05bb3>]
>> ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0dd0baf>]
>> osd_attr_set+0x12f/0x540 [osd_ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecb969>]
>> dt_attr_set.clone.2+0x29/0xc0 [ofd]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecf472>]
>> ofd_attr_set+0x522/0x6c0 [ofd]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ec0e68>]
>> ofd_setattr+0x678/0xc10 [ofd]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07eeeae>] ?
>> lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e711bb>]
>> ost_setattr+0x30b/0x930 [ost]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e741bd>]
>> ost_handle+0x1f8d/0x44d0 [ost]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f68db>] ?
>> ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07fecf5>]
>> ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05164ce>] ?
>> cfs_timer_arm+0xe/0x10 [libcfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05273cf>] ?
>> lc_watchdog_touch+0x6f/0x170 [libcfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f63d9>] ?
>> ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff810546b9>] ?
>> __wake_up_common+0x59/0x90
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa080005d>]
>> ptlrpc_main+0xaed/0x1740 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07ff570>] ?
>> ptlrpc_main+0x0/0x1740 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109abf6>]
>> kthread+0x96/0xa0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c20a>]
>> child_rip+0xa/0x20
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109ab60>] ?
>> kthread+0x0/0xa0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c200>] ?
>> child_rip+0x0/0x20
>> Oct 30 04:59:52 psanaoss231 kernel: Code: c6 9c 03 00 00 4c 89 f7 e8
>> c1 21 41 e1 48 8b 33 ba 01 00 00 00 4c 89 e7 e8 11 ec ff ff 4c 89 f0
>> 66 ff 00 66 66 90 e9 73 ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 66
>> 0f 1f 84 00 00 00 00 00 eb f5
>> Oct 30 04:59:52 psanaoss231 kernel: RIP [<ffffffffa01198ad>]
>> jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
>> Oct 30 04:59:52 psanaoss231 kernel: RSP <ffff880c058437d0>
>> Oct 30 04:59:52 psanaoss231 kernel: ---[ end trace 5ceb40448d3277c6 ]---
>> Oct 30 04:59:52 psanaoss231 kernel: Kernel panic - not syncing: Fatal
>> exception
>> Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007
>> Tainted: G D ---------------
>> 2.6.32-431.23.3.el6_lustre.x86_64 #1
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list