[lustre-discuss] Lustre OSS kernel panic after mounting OSTs

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Tue Oct 30 05:43:01 PDT 2018


thank you Fernando  for the hint, I did it right  now thanks. I am 
running e2fsck again.
Anyway my problem was this:

https://jira.whamcloud.com/browse/LU-5040

thank you

On 10/30/18 5:28 AM, Fernando Perez wrote:
> Dear Riccardo.
>
> Have you tried to upgrade e2fsprogs packages before perform the e2fsck?
>
> Regards.
>
> =============================================
> Fernando Pérez
> Institut de Ciències del Mar (CSIC)
> Departament Oceanografía Física i Tecnològica
> Passeig Marítim de la Barceloneta,37-49
> 08003 Barcelona
> Phone:  (+34) 93 230 96 35
> =============================================
>
> On 10/30/2018 01:05 PM, Riccardo Veraldi wrote:
>> Hello,
>>
>> I have quite a very critical problem.
>>
>> One of my OSSes hanfs into a kernel panic when trying to mount the OSTs.
>>
>> After mounting 11 OSTs over 12 total OSTs it goes into kernel panic. 
>> Does not matter hte order in which they are mounted.
>>
>> Any clue on hints ?
>>
>> I cannot really recover it and I have important data on it.
>>
>> I already performed an e2fsck. Anyway it did not fix. it has found a 
>> few inode count inconsistencies before.
>>
>> kernel is 2.6.32-431.23.3.el6_lustre.x86_64
>>
>> Red Hat Enterprise Linux Server release 6.7 (Santiago)
>>
>> lustre-2.5.3-2.6.32_431.23.3.el6_lustre.x86_64.x86_64
>>
>>
>> Oct 30 04:58:52 psanaoss231 kernel: INFO: task tgt_recov:4569 blocked 
>> for more than 120 seconds.
>>
>> Oct 30 04:58:52 psanaoss231 kernel:      Not tainted 
>> 2.6.32-431.23.3.el6_lustre.x86_64 #1
>> Oct 30 04:58:52 psanaoss231 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 30 04:58:52 psanaoss231 kernel: tgt_recov     D 
>> 0000000000000003     0  4569      2 0x00000080
>> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1da0 0000000000000046 
>> 0000000000000000 0000000000000003
>> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1d30 ffffffff81059096 
>> ffff880bf2ae1d40 ffff880bf2a1d500
>> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2b01ab8 ffff880bf2ae1fd8 
>> 000000000000fbc8 ffff880bf2b01ab8
>> Oct 30 04:58:52 psanaoss231 kernel: Call Trace:
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81059096>] ? 
>> enqueue_task+0x66/0x80
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae560>] ? 
>> check_for_clients+0x0/0x70 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07afbcd>] 
>> target_recovery_overseer+0x9d/0x230 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae250>] ? 
>> exp_connect_healthy+0x0/0x20 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109afa0>] ? 
>> autoremove_wake_function+0x0/0x40
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ? 
>> target_recovery_thread+0x0/0x1920 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b69d0>] 
>> target_recovery_thread+0x540/0x1920 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81061d12>] ? 
>> default_wake_function+0x12/0x20
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ? 
>> target_recovery_thread+0x0/0x1920 [ptlrpc]
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109abf6>] 
>> kthread+0x96/0xa0
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c20a>] 
>> child_rip+0xa/0x20
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109ab60>] ? 
>> kthread+0x0/0xa0
>> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c200>] ? 
>> child_rip+0x0/0x20
>> Oct 30 04:59:02 psanaoss231 kernel: Lustre: ana13-OST0004: Recovery 
>> over after 3:05, of 147 clients 146 recovered and 1 was evicted.
>> Oct 30 04:59:03 psanaoss231 kernel: Lustre: ana13-OST0004: Client 
>> 89ba817f-45c3-5e64-99a8-b472651bbe45 (at 172.21.52.213 at o2ib) 
>> reconnecting
>> Oct 30 04:59:03 psanaoss231 kernel: Lustre: Skipped 94 previous 
>> similar messages
>> Oct 30 04:59:21 psanaoss231 kernel: LustreError: 
>> 4569:0:(ost_handler.c:1123:ost_brw_write()) Dropping timed-out write 
>> from 12345-172.21.49.129 at tcp because locking object 0x0:14198730 took 
>> 153 seconds (limit was 30).
>> Oct 30 04:59:21 psanaoss231 kernel: Lustre: ana13-OST0005: Bulk IO 
>> write error with 3a71df2f-16e7-d507-2495-ab60364d8e7c (at 
>> 172.21.49.129 at tcp), client will retry: rc -110
>> Oct 30 04:59:52 psanaoss231 kernel: ------------[ cut here ]------------
>> Oct 30 04:59:52 psanaoss231 kernel: kernel BUG at 
>> fs/jbd2/transaction.c:1033!
>> Oct 30 04:59:52 psanaoss231 kernel: invalid opcode: 0000 [#1] SMP
>> Oct 30 04:59:52 psanaoss231 kernel: last sysfs file: 
>> /sys/devices/system/cpu/online
>> Oct 30 04:59:52 psanaoss231 kernel: CPU 10
>> Oct 30 04:59:52 psanaoss231 kernel: Modules linked in: osp(U) ofd(U) 
>> lfsck(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) 
>> ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) 
>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic 
>> sha256_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss 
>> nfs_acl mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase 
>> autofs4 sunrpc ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 
>> nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm 
>> ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode 
>> power_meter iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf sb_edac 
>> edac_core lpc_ich mfd_core shpchp igb i2c_algo_bit i2c_core ses 
>> enclosure sg ixgbe dca ptp pps_core mdio ext4 jbd2 mbcache raid1 
>> sd_mod crc_t10dif ahci wmi mlx4_ib ib_sa ib_mad ib_core mlx4_en 
>> mlx4_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last 
>> unloaded: speedstep_lib]
>> Oct 30 04:59:52 psanaoss231 kernel:
>> Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 Not 
>> tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 Dell Inc. PowerEdge 
>> R620/0PXXHP
>> Oct 30 04:59:52 psanaoss231 kernel: RIP: 0010:[<ffffffffa01198ad>]  
>> [<ffffffffa01198ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
>> Oct 30 04:59:52 psanaoss231 kernel: RSP: 0018:ffff880c058437d0 
>> EFLAGS: 00010246
>> Oct 30 04:59:52 psanaoss231 kernel: RAX: ffff880c05573dc0 RBX: 
>> ffff880c043b8d08 RCX: ffff88175b0fedc8
>> Oct 30 04:59:52 psanaoss231 kernel: RDX: 0000000000000000 RSI: 
>> ffff88175b0fedc8 RDI: 0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: RBP: ffff880c058437f0 R08: 
>> 9010000000000000 R09: e886f5e8fbf37202
>> Oct 30 04:59:52 psanaoss231 kernel: R10: 0000000000000002 R11: 
>> 0000000000000000 R12: ffff880c040c26d8
>> Oct 30 04:59:52 psanaoss231 kernel: R13: ffff88175b0fedc8 R14: 
>> ffff88174728c800 R15: 0000000000000008
>> Oct 30 04:59:52 psanaoss231 kernel: FS:  0000000000000000(0000) 
>> GS:ffff8800282a0000(0000) knlGS:0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
>> 000000008005003b
>> Oct 30 04:59:52 psanaoss231 kernel: CR2: 00000034f304b750 CR3: 
>> 0000000001a85000 CR4: 00000000000407e0
>> Oct 30 04:59:52 psanaoss231 kernel: DR0: 0000000000000000 DR1: 
>> 0000000000000000 DR2: 0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: DR3: 0000000000000000 DR6: 
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Oct 30 04:59:52 psanaoss231 kernel: Process ll_ost01_007 (pid: 4272, 
>> threadinfo ffff880c05842000, task ffff880c0634eaa0)
>> Oct 30 04:59:52 psanaoss231 kernel: Stack:
>> Oct 30 04:59:52 psanaoss231 kernel: ffff880c043b8d08 ffffffffa0d136f0 
>> ffff88175b0fedc8 0000000000000000
>> Oct 30 04:59:52 psanaoss231 kernel: <d> ffff880c05843830 
>> ffffffffa0cd100b ffff880c05843820 ffffffff8109af8f
>> Oct 30 04:59:52 psanaoss231 kernel: <d> ffff88175b105a40 
>> ffff880c043b8d08 0000000000000018 ffff88175b0fedc8
>> Oct 30 04:59:52 psanaoss231 kernel: Call Trace:
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0cd100b>] 
>> __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109af8f>] ? 
>> wake_up_bit+0x2f/0x40
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d067c5>] 
>> ldiskfs_quota_write+0x165/0x210 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eef11>] 
>> v2_write_file_info+0xa1/0xe0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eb018>] 
>> dquot_acquire+0x138/0x140
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05956>] 
>> ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ecf8c>] 
>> dqget+0x2ac/0x390
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ed51b>] 
>> dquot_initialize+0x7b/0x240
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8116f553>] ? 
>> kmem_cache_alloc_trace+0x1a3/0x1b0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05bb3>] 
>> ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0dd0baf>] 
>> osd_attr_set+0x12f/0x540 [osd_ldiskfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecb969>] 
>> dt_attr_set.clone.2+0x29/0xc0 [ofd]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecf472>] 
>> ofd_attr_set+0x522/0x6c0 [ofd]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ec0e68>] 
>> ofd_setattr+0x678/0xc10 [ofd]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07eeeae>] ? 
>> lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e711bb>] 
>> ost_setattr+0x30b/0x930 [ost]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e741bd>] 
>> ost_handle+0x1f8d/0x44d0 [ost]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f68db>] ? 
>> ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07fecf5>] 
>> ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05164ce>] ? 
>> cfs_timer_arm+0xe/0x10 [libcfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05273cf>] ? 
>> lc_watchdog_touch+0x6f/0x170 [libcfs]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f63d9>] ? 
>> ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff810546b9>] ? 
>> __wake_up_common+0x59/0x90
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa080005d>] 
>> ptlrpc_main+0xaed/0x1740 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07ff570>] ? 
>> ptlrpc_main+0x0/0x1740 [ptlrpc]
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109abf6>] 
>> kthread+0x96/0xa0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c20a>] 
>> child_rip+0xa/0x20
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109ab60>] ? 
>> kthread+0x0/0xa0
>> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c200>] ? 
>> child_rip+0x0/0x20
>> Oct 30 04:59:52 psanaoss231 kernel: Code: c6 9c 03 00 00 4c 89 f7 e8 
>> c1 21 41 e1 48 8b 33 ba 01 00 00 00 4c 89 e7 e8 11 ec ff ff 4c 89 f0 
>> 66 ff 00 66 66 90 e9 73 ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 66 
>> 0f 1f 84 00 00 00 00 00 eb f5
>> Oct 30 04:59:52 psanaoss231 kernel: RIP [<ffffffffa01198ad>] 
>> jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
>> Oct 30 04:59:52 psanaoss231 kernel: RSP <ffff880c058437d0>
>> Oct 30 04:59:52 psanaoss231 kernel: ---[ end trace 5ceb40448d3277c6 ]---
>> Oct 30 04:59:52 psanaoss231 kernel: Kernel panic - not syncing: Fatal 
>> exception
>> Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 
>> Tainted: G      D    --------------- 
>> 2.6.32-431.23.3.el6_lustre.x86_64 #1
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org




More information about the lustre-discuss mailing list