[lustre-discuss] Lustre OSS kernel panic after mounting OSTs

Fernando Perez fperez at icm.csic.es
Tue Oct 30 05:28:12 PDT 2018


Dear Riccardo.

Have you tried to upgrade e2fsprogs packages before perform the e2fsck?

Regards.

=============================================
Fernando Pérez
Institut de Ciències del Mar (CSIC)
Departament Oceanografía Física i Tecnològica
Passeig Marítim de la Barceloneta,37-49
08003 Barcelona
Phone:  (+34) 93 230 96 35
=============================================

On 10/30/2018 01:05 PM, Riccardo Veraldi wrote:
> Hello,
>
> I have quite a very critical problem.
>
> One of my OSSes hanfs into a kernel panic when trying to mount the OSTs.
>
> After mounting 11 OSTs over 12 total OSTs it goes into kernel panic. 
> Does not matter hte order in which they are mounted.
>
> Any clue on hints ?
>
> I cannot really recover it and I have important data on it.
>
> I already performed an e2fsck. Anyway it did not fix. it has found a 
> few inode count inconsistencies before.
>
> kernel is 2.6.32-431.23.3.el6_lustre.x86_64
>
> Red Hat Enterprise Linux Server release 6.7 (Santiago)
>
> lustre-2.5.3-2.6.32_431.23.3.el6_lustre.x86_64.x86_64
>
>
> Oct 30 04:58:52 psanaoss231 kernel: INFO: task tgt_recov:4569 blocked 
> for more than 120 seconds.
>
> Oct 30 04:58:52 psanaoss231 kernel:      Not tainted 
> 2.6.32-431.23.3.el6_lustre.x86_64 #1
> Oct 30 04:58:52 psanaoss231 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 30 04:58:52 psanaoss231 kernel: tgt_recov     D 
> 0000000000000003     0  4569      2 0x00000080
> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1da0 0000000000000046 
> 0000000000000000 0000000000000003
> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1d30 ffffffff81059096 
> ffff880bf2ae1d40 ffff880bf2a1d500
> Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2b01ab8 ffff880bf2ae1fd8 
> 000000000000fbc8 ffff880bf2b01ab8
> Oct 30 04:58:52 psanaoss231 kernel: Call Trace:
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81059096>] ? 
> enqueue_task+0x66/0x80
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae560>] ? 
> check_for_clients+0x0/0x70 [ptlrpc]
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07afbcd>] 
> target_recovery_overseer+0x9d/0x230 [ptlrpc]
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae250>] ? 
> exp_connect_healthy+0x0/0x20 [ptlrpc]
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109afa0>] ? 
> autoremove_wake_function+0x0/0x40
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ? 
> target_recovery_thread+0x0/0x1920 [ptlrpc]
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b69d0>] 
> target_recovery_thread+0x540/0x1920 [ptlrpc]
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81061d12>] ? 
> default_wake_function+0x12/0x20
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ? 
> target_recovery_thread+0x0/0x1920 [ptlrpc]
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109abf6>] 
> kthread+0x96/0xa0
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c20a>] 
> child_rip+0xa/0x20
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109ab60>] ? 
> kthread+0x0/0xa0
> Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c200>] ? 
> child_rip+0x0/0x20
> Oct 30 04:59:02 psanaoss231 kernel: Lustre: ana13-OST0004: Recovery 
> over after 3:05, of 147 clients 146 recovered and 1 was evicted.
> Oct 30 04:59:03 psanaoss231 kernel: Lustre: ana13-OST0004: Client 
> 89ba817f-45c3-5e64-99a8-b472651bbe45 (at 172.21.52.213 at o2ib) reconnecting
> Oct 30 04:59:03 psanaoss231 kernel: Lustre: Skipped 94 previous 
> similar messages
> Oct 30 04:59:21 psanaoss231 kernel: LustreError: 
> 4569:0:(ost_handler.c:1123:ost_brw_write()) Dropping timed-out write 
> from 12345-172.21.49.129 at tcp because locking object 0x0:14198730 took 
> 153 seconds (limit was 30).
> Oct 30 04:59:21 psanaoss231 kernel: Lustre: ana13-OST0005: Bulk IO 
> write error with 3a71df2f-16e7-d507-2495-ab60364d8e7c (at 
> 172.21.49.129 at tcp), client will retry: rc -110
> Oct 30 04:59:52 psanaoss231 kernel: ------------[ cut here ]------------
> Oct 30 04:59:52 psanaoss231 kernel: kernel BUG at 
> fs/jbd2/transaction.c:1033!
> Oct 30 04:59:52 psanaoss231 kernel: invalid opcode: 0000 [#1] SMP
> Oct 30 04:59:52 psanaoss231 kernel: last sysfs file: 
> /sys/devices/system/cpu/online
> Oct 30 04:59:52 psanaoss231 kernel: CPU 10
> Oct 30 04:59:52 psanaoss231 kernel: Modules linked in: osp(U) ofd(U) 
> lfsck(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) 
> ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) 
> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic 
> sha256_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss 
> nfs_acl mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase 
> autofs4 sunrpc ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 
> nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm 
> ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode 
> power_meter iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf sb_edac 
> edac_core lpc_ich mfd_core shpchp igb i2c_algo_bit i2c_core ses 
> enclosure sg ixgbe dca ptp pps_core mdio ext4 jbd2 mbcache raid1 
> sd_mod crc_t10dif ahci wmi mlx4_ib ib_sa ib_mad ib_core mlx4_en 
> mlx4_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last 
> unloaded: speedstep_lib]
> Oct 30 04:59:52 psanaoss231 kernel:
> Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 Not 
> tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 Dell Inc. PowerEdge 
> R620/0PXXHP
> Oct 30 04:59:52 psanaoss231 kernel: RIP: 0010:[<ffffffffa01198ad>]  
> [<ffffffffa01198ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
> Oct 30 04:59:52 psanaoss231 kernel: RSP: 0018:ffff880c058437d0 EFLAGS: 
> 00010246
> Oct 30 04:59:52 psanaoss231 kernel: RAX: ffff880c05573dc0 RBX: 
> ffff880c043b8d08 RCX: ffff88175b0fedc8
> Oct 30 04:59:52 psanaoss231 kernel: RDX: 0000000000000000 RSI: 
> ffff88175b0fedc8 RDI: 0000000000000000
> Oct 30 04:59:52 psanaoss231 kernel: RBP: ffff880c058437f0 R08: 
> 9010000000000000 R09: e886f5e8fbf37202
> Oct 30 04:59:52 psanaoss231 kernel: R10: 0000000000000002 R11: 
> 0000000000000000 R12: ffff880c040c26d8
> Oct 30 04:59:52 psanaoss231 kernel: R13: ffff88175b0fedc8 R14: 
> ffff88174728c800 R15: 0000000000000008
> Oct 30 04:59:52 psanaoss231 kernel: FS:  0000000000000000(0000) 
> GS:ffff8800282a0000(0000) knlGS:0000000000000000
> Oct 30 04:59:52 psanaoss231 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
> 000000008005003b
> Oct 30 04:59:52 psanaoss231 kernel: CR2: 00000034f304b750 CR3: 
> 0000000001a85000 CR4: 00000000000407e0
> Oct 30 04:59:52 psanaoss231 kernel: DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Oct 30 04:59:52 psanaoss231 kernel: DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Oct 30 04:59:52 psanaoss231 kernel: Process ll_ost01_007 (pid: 4272, 
> threadinfo ffff880c05842000, task ffff880c0634eaa0)
> Oct 30 04:59:52 psanaoss231 kernel: Stack:
> Oct 30 04:59:52 psanaoss231 kernel: ffff880c043b8d08 ffffffffa0d136f0 
> ffff88175b0fedc8 0000000000000000
> Oct 30 04:59:52 psanaoss231 kernel: <d> ffff880c05843830 
> ffffffffa0cd100b ffff880c05843820 ffffffff8109af8f
> Oct 30 04:59:52 psanaoss231 kernel: <d> ffff88175b105a40 
> ffff880c043b8d08 0000000000000018 ffff88175b0fedc8
> Oct 30 04:59:52 psanaoss231 kernel: Call Trace:
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0cd100b>] 
> __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109af8f>] ? 
> wake_up_bit+0x2f/0x40
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d067c5>] 
> ldiskfs_quota_write+0x165/0x210 [ldiskfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eef11>] 
> v2_write_file_info+0xa1/0xe0
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eb018>] 
> dquot_acquire+0x138/0x140
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05956>] 
> ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ecf8c>] 
> dqget+0x2ac/0x390
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ed51b>] 
> dquot_initialize+0x7b/0x240
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8116f553>] ? 
> kmem_cache_alloc_trace+0x1a3/0x1b0
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05bb3>] 
> ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0dd0baf>] 
> osd_attr_set+0x12f/0x540 [osd_ldiskfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecb969>] 
> dt_attr_set.clone.2+0x29/0xc0 [ofd]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecf472>] 
> ofd_attr_set+0x522/0x6c0 [ofd]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ec0e68>] 
> ofd_setattr+0x678/0xc10 [ofd]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07eeeae>] ? 
> lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e711bb>] 
> ost_setattr+0x30b/0x930 [ost]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e741bd>] 
> ost_handle+0x1f8d/0x44d0 [ost]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f68db>] ? 
> ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07fecf5>] 
> ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05164ce>] ? 
> cfs_timer_arm+0xe/0x10 [libcfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05273cf>] ? 
> lc_watchdog_touch+0x6f/0x170 [libcfs]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f63d9>] ? 
> ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff810546b9>] ? 
> __wake_up_common+0x59/0x90
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa080005d>] 
> ptlrpc_main+0xaed/0x1740 [ptlrpc]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07ff570>] ? 
> ptlrpc_main+0x0/0x1740 [ptlrpc]
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109abf6>] 
> kthread+0x96/0xa0
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c20a>] 
> child_rip+0xa/0x20
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109ab60>] ? 
> kthread+0x0/0xa0
> Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c200>] ? 
> child_rip+0x0/0x20
> Oct 30 04:59:52 psanaoss231 kernel: Code: c6 9c 03 00 00 4c 89 f7 e8 
> c1 21 41 e1 48 8b 33 ba 01 00 00 00 4c 89 e7 e8 11 ec ff ff 4c 89 f0 
> 66 ff 00 66 66 90 e9 73 ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 66 0f 
> 1f 84 00 00 00 00 00 eb f5
> Oct 30 04:59:52 psanaoss231 kernel: RIP [<ffffffffa01198ad>] 
> jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
> Oct 30 04:59:52 psanaoss231 kernel: RSP <ffff880c058437d0>
> Oct 30 04:59:52 psanaoss231 kernel: ---[ end trace 5ceb40448d3277c6 ]---
> Oct 30 04:59:52 psanaoss231 kernel: Kernel panic - not syncing: Fatal 
> exception
> Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 
> Tainted: G      D    --------------- 2.6.32-431.23.3.el6_lustre.x86_64 #1
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list