[lustre-discuss] Lustre 2.12 client crashes

Peter Jones pjones at whamcloud.com
Mon Jan 20 09:22:58 PST 2020


Christopher

Apologies for the confusing message about requesting an account for JIRA - I'll see if we can remove that message but I think that it might be system-generated. We've had to disable self-registration because of repeated hacking attempts via that mechanism. The message on the left "For questions or login request, send email to Jira administrators" works - the link there sends an email to info at whamcloud.com and several requests come through per week via that channel - but I can see why the message on the right would draw your eye...

Peter

On 2020-01-20, 8:15 AM, "lustre-discuss on behalf of Christopher Mountford" <lustre-discuss-bounces at lists.lustre.org on behalf of cjm14 at leicester.ac.uk> wrote:

    We've seen 3 lustre client panics in the last few hours when using the b2_12 branch (we're using it on client nodes as it patches a data on MDT bug in 2.12.3. Still using 2.12.3 on MDS/OSS). This looks similar similar to LU-12581, which we had seen on our system before but was fixed in 2.12.3. Could this have been re-introduced in the b2_12 branch?
    
    I've included the dmesg from one of the panics below. Unfortunately we have not yet found a way to reproduce the problem. Has anyone seen anything similar to this?
    
    Is this mailing list a suitable place to ask for help on this sort of bug? I've been looking at the Whamcloud Community Jira, but the link to request an account returns "Your Jira administrator has not yet configured this contact form."
    
    dmesg from failed client:
    
    [542909.741793] =============================================================================
    [542909.741800] BUG kmalloc-8 (Tainted: G           OE  ------------  ): Freechain corrupt
    [542909.741802] -----------------------------------------------------------------------------
    
    [542909.741805] Disabling lock debugging due to kernel taint
    [542909.741809] INFO: Slab 0xffffe0933440b3c0 objects=102 used=75 fp=0xffff9bb6902cf558 flags=0x6fffff00000081
    [542909.741812] INFO: Object 0xffff9bb6902cfad0 @offset=2768 fp=0x7fff9bb6902cfdf0
    
    [542909.741816] Redzone ffff9bb6902cfac8: bb 3b 3b 3b 3b bb bb bb                          .;;;;...
    [542909.741818] Object ffff9bb6902cfad0: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
    [542909.741821] Redzone ffff9bb6902cfad8: bb bb bb 3b bb bb bb bb                          ...;....
    [542909.741823] Padding ffff9bb6902cfae8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
    [542909.741828] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G    B      OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
    [542909.741830] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
    [542909.741832] Call Trace:
    [542909.741846]  [<ffffffffa277ac23>] dump_stack+0x19/0x1b
    [542909.741852]  [<ffffffffa2221561>] print_trailer+0x161/0x280
    [542909.741856]  [<ffffffffa2221ebf>] on_freelist+0xff/0x270
    [542909.741860]  [<ffffffffa27774cc>] free_debug_processing+0x18d/0x270
    [542909.741867]  [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
    [542909.741870]  [<ffffffffa2223bee>] __slab_free+0x1ce/0x290
    [542909.741878]  [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80
    [542909.741883]  [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0
    [542909.741889]  [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10
    [542909.741892]  [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
    [542909.741895]  [<ffffffffa2223db6>] kfree+0x106/0x140
    [542909.741899]  [<ffffffffa21ddcb5>] kvfree+0x35/0x40
    [542909.741902]  [<ffffffffa227399b>] setxattr+0x15b/0x1e0
    [542909.741909]  [<ffffffffa225c3ed>] ? putname+0x3d/0x60
    [542909.741914]  [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0
    [542909.741920]  [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120
    [542909.741926]  [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180
    [542909.741930]  [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100
    [542909.741937]  [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a
    [542909.741940] =============================================================================
    [542909.741942] BUG kmalloc-8 (Tainted: G    B      OE  ------------  ): Wrong object count. Counter is 75 but counted were 95
    [542909.741944] -----------------------------------------------------------------------------
    
    [542909.741947] INFO: Slab 0xffffe0933440b3c0 objects=102 used=75 fp=0xffff9bb6902cf558 flags=0x6fffff00000081
    [542909.741951] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G    B      OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
    [542909.741953] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
    [542909.741954] Call Trace:
    [542909.741958]  [<ffffffffa277ac23>] dump_stack+0x19/0x1b
    [542909.741961]  [<ffffffffa2221b54>] slab_err+0xb4/0xe0
    [542909.741969]  [<ffffffffa2030a1e>] ? show_stack+0x4e/0x60
    [542909.741972]  [<ffffffffa2221561>] ? print_trailer+0x161/0x280
    [542909.741975]  [<ffffffffa2221f85>] on_freelist+0x1c5/0x270
    [542909.742227]  [<ffffffffa27774cc>] free_debug_processing+0x18d/0x270
    [542909.742479]  [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
    [542909.742483]  [<ffffffffa2223bee>] __slab_free+0x1ce/0x290
    [542909.742488]  [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80
    [542909.742491]  [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0
    [542909.742495]  [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10
    [542909.742498]  [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
    [542909.742501]  [<ffffffffa2223db6>] kfree+0x106/0x140
    [542909.742504]  [<ffffffffa21ddcb5>] kvfree+0x35/0x40
    [542909.742508]  [<ffffffffa227399b>] setxattr+0x15b/0x1e0
    [542909.742511]  [<ffffffffa225c3ed>] ? putname+0x3d/0x60
    [542909.742515]  [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0
    [542909.742519]  [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120
    [542909.742523]  [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180
    [542909.742527]  [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100
    [542909.742530]  [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a
    [542909.742533] FIX kmalloc-8: Object count adjusted.
    [542909.742536] =============================================================================
    [542909.742538] BUG kmalloc-8 (Tainted: G    B      OE  ------------  ): Redzone overwritten
    [542909.742539] -----------------------------------------------------------------------------
    
    [542909.742543] INFO: 0xffff9bb6902cf858-0xffff9bb6902cf85f. First byte 0x4c instead of 0xcc
    [542909.742545] INFO: Slab 0xffffe0933440b3c0 objects=102 used=95 fp=0xffff9bb6902cf558 flags=0x6fffff00000081
    [542909.742547] INFO: Object 0xffff9bb6902cf850 @offset=2128 fp=0x7f7f1b36102c7c10
    
    [542909.742550] Redzone ffff9bb6902cf848: cc cc cc cc cc cc cc cc                          ........
    [542909.742552] Object ffff9bb6902cf850: d0 0b d6 0b 88 01 00 25                          .......%
    [542909.742555] Redzone ffff9bb6902cf858: 4c 4c 4c 4c 4c 4c 4c 4c                          LLLLLLLL
    [542909.742557] Padding ffff9bb6902cf868: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
    [542909.742560] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G    B      OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
    [542909.742562] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
    [542909.742563] Call Trace:
    [542909.742567]  [<ffffffffa277ac23>] dump_stack+0x19/0x1b
    [542909.742570]  [<ffffffffa2221561>] print_trailer+0x161/0x280
    [542909.742573]  [<ffffffffa22217ef>] check_bytes_and_report+0xcf/0x110
    [542909.742576]  [<ffffffffa222237d>] check_object+0x1dd/0x2a0
    [542909.742580]  [<ffffffffa27773cc>] free_debug_processing+0x8d/0x270
    [542909.742583]  [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
    [542909.742586]  [<ffffffffa2223bee>] __slab_free+0x1ce/0x290
    [542909.742590]  [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80
    [542909.742593]  [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0
    [542909.742596]  [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10
    [542909.742599]  [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40
    [542909.742602]  [<ffffffffa2223db6>] kfree+0x106/0x140
    [542909.742606]  [<ffffffffa21ddcb5>] kvfree+0x35/0x40
    [542909.742609]  [<ffffffffa227399b>] setxattr+0x15b/0x1e0
    [542909.742613]  [<ffffffffa225c3ed>] ? putname+0x3d/0x60
    [542909.742617]  [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0
    [542909.742621]  [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120
    [542909.742624]  [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180
    [542909.742628]  [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100
    [542909.742631]  [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a
    [542909.742635] FIX kmalloc-8: Restoring 0xffff9bb6902cf858-0xffff9bb6902cf85f=0xcc
    
    [542909.742648] FIX kmalloc-8: Object at 0xffff9bb6902cf850 not freed
    [542909.763926] general protection fault: 0000 [#1] SMP 
    [542909.792826] Modules linked in: tcp_diag inet_diag fuse nfsd mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) cts lnet(OE) rpcsec_gss_krb5 nfsv4 dns_resolver libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_recent xt_conntrack nf_conntrack iptable_filter mlx4_ib(OE) dm_mirror dm_region_hash dm_log dm_mod ib_uverbs(OE) ib_core(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel mgag200 mlx4_core(OE) iTCO_wdt iTCO_vendor_support ttm kvm drm_kms_helper irqbypass syscopyarea sysfillrect crc32_pclmul sysimgblt crc32c_intel
    [542910.218156]  fb_sys_fops mlx_compat(OE) ghash_clmulni_intel drm aesni_intel lrw gf128mul glue_helper ses ablk_helper devlink enclosure cryptd drm_panel_orientation_quirks hpwdt i2c_i801 pcspkr pcc_cpufreq wmi ioatdma ipmi_si acpi_power_meter ipmi_devintf ipmi_msghandler lpc_ich knem(OE) binfmt_misc auth_rpcgss ip_tables smartpqi bridge stp llc xfs isci libsas qla3xxx e1000e igb i2c_algo_bit megaraid_sas aacraid aic79xx ata_piix mpt2sas raid_class mptspi scsi_transport_spi mptsas mptscsih mptbase arcmsr ahci libahci sata_nv sata_svw bnx2x libcrc32c bnx2 ext4 mbcache jbd2 sata_sil libata tg3 e1000 nfsv3 nfs_acl nfs lockd grace sunrpc fscache tun sd_mod crc_t10dif crct10dif_generic sg ixgbe crct10dif_pclmul crct10dif_common hpsa dca mdio hpilo ptp scsi_transport_sas pps_core [last unloaded: ipmi_msghandler]
    [542910.624054] 
    [542910.625230] CPU: 27 PID: 25861 Comm: gdbus Kdump: loaded Tainted: G    B      OE  ------------   3.10.0-1062.9.1.el7.x86_64 #1
    [542910.685731] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
    [542910.724144] task: ffff9ba5b5bc1070 ti: ffff9ba6067c0000 task.ti: ffff9ba6067c0000
    [542910.768155] RIP: 0010:[<ffffffffa21f711b>]  [<ffffffffa21f711b>] find_vma+0x3b/0x60
    [542910.810986] RSP: 0000:ffff9ba6067c3ea8  EFLAGS: 00010202
    [542910.840760] RAX: ffff9bb72066f1b8 RBX: 0000000000000004 RCX: ffff9ba6067c3fd8
    [542910.880983] RDX: 7fff9bb7c2fec608 RSI: 0000000000682888 RDI: ffff9ba002a34b00
    [542910.919946] RBP: ffff9ba6067c3ea8 R08: 0000000000000001 R09: 0000000000000000
    [542910.958846] R10: 000000000000001c R11: 00002aaaae480b40 R12: 00000000000000a8
    [542910.998593] R13: 0000000000682888 R14: ffff9ba6067c3f58 R15: ffff9ba002a34b00
    [542911.038992] FS:  00002aaabc395700(0000) GS:ffff9bb97f140000(0000) knlGS:0000000000000000
    [542911.095715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [542911.155694] CR2: 0000000000682888 CR3: 0000003214b00000 CR4: 00000000003607e0
    [542911.202949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [542911.265589] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [542911.315387] Call Trace:
    [542911.355844]  [<ffffffffa278857d>] __do_page_fault+0x13d/0x500
    [542911.413348]  [<ffffffffa2788975>] do_page_fault+0x35/0x90
    [542911.455443]  [<ffffffffa2784778>] page_fault+0x28/0x30
    [542911.495307] Code: 74 06 48 39 70 08 77 40 48 8b 57 08 31 c0 48 85 d2 75 18 eb 2e 0f 1f 00 48 3b 72 e0 48 8d 42 e0 73 1d 48 8b 52 10 48 85 d2 74 0f <48> 3b 72 e8 72 e7 48 8b 52 08 48 85 d2 75 f1 48 85 c0 74 04 48 
    [542911.665436] RIP  [<ffffffffa21f711b>] find_vma+0x3b/0x60
    [542911.695917]  RSP <ffff9ba6067c3ea8>
    
    -- 
    -- 
    # Dr. Christopher Mountford
    # System specialist - Research Computing/HPC
    # 
    # IT services,
    #     University of Leicester, University Road, 
    #     Leicester, LE1 7RH, UK 
    #
    # t: 0116 252 3471
    # e: cjm14 at le.ac.uk
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    



More information about the lustre-discuss mailing list