[lustre-discuss] 2.12.6 freeze

Alastair Basden a.g.basden at durham.ac.uk
Mon Nov 29 10:42:13 PST 2021


Some more information.  This is repeatable... (previously the file system 
has been fine - it's an established file system).

To get this, we boot the node, and then do:
zpool import -o cachefile=none  pool1
zpool status shows all is well.

mount -t lustre pool1/pool1 /mnt/lustre/pool1

And the kernel panic.


Some additional logs in /var/log/messages:
Nov 29 18:37:54 c8oss01 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 128, npartitions: 2
Nov 29 18:37:54 c8oss01 kernel: alg: No test for adler32 (adler32-zlib)
Nov 29 18:37:55 c8oss01 kernel: Lustre: Lustre: Build Version: 2.12.6
Nov 29 18:37:55 c8oss01 kernel: LNet: 40260:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down
Nov 29 18:37:55 c8oss01 kernel: LNet: Using FastReg for registration
Nov 29 18:37:55 c8oss01 kernel: LNet: Added LNI 172.18.185.5 at o2ib [32/512/0/100]
Nov 29 18:37:55 c8oss01 kernel: LNet: Added LNI 172.17.185.5 at tcp [8/256/0/180]
Nov 29 18:37:55 c8oss01 kernel: LNet: Accept secure, port 988
Nov 29 18:37:55 c8oss01 zed: eid=85 class=data pool_guid=0x07C7BF473C816BCB
Nov 29 18:37:55 c8oss01 kernel: LustreError: 40228:0:(lu_object.c:1267:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
Nov 29 18:37:55 c8oss01 kernel: LustreError: 40228:0:(lu_object.c:1267:lu_device_fini()) LBUG
Nov 29 18:37:55 c8oss01 kernel: Pid: 40228, comm: mount.lustre 3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Wed Dec 9 20:53:35 UTC 2020
Nov 29 18:37:55 c8oss01 zed: eid=86 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathk
Nov 29 18:37:55 c8oss01 kernel: Call Trace:
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc09687cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Nov 29 18:37:56 c8oss01 zed: eid=87 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpatheg
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc096887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19afa9b>] lu_device_fini+0xbb/0xc0 [obdclass]
Nov 29 18:37:56 c8oss01 zed: eid=88 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathbj
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19b595e>] dt_device_fini+0xe/0x10 [obdclass]
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc0b82248>] osd_device_alloc+0x278/0x3b0 [osd_zfs]
Nov 29 18:37:56 c8oss01 zed: eid=89 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathag
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc199e9c9>] obd_setup+0x119/0x280 [obdclass]
Nov 29 18:37:56 c8oss01 zed: eid=90 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathaf
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc199edd8>] class_setup+0x2a8/0x840 [obdclass]
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19a1e86>] class_process_config+0x1726/0x2830 [obdclass]
Nov 29 18:37:56 c8oss01 zed: eid=91 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathep
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19a63a8>] do_lcfg+0x258/0x500 [obdclass]
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19aabd8>] lustre_start_simple+0x88/0x210 [obdclass]
Nov 29 18:37:56 c8oss01 zed: eid=92 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathk
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19d9455>] server_fill_super+0xf55/0x1890 [obdclass]
Nov 29 18:37:56 c8oss01 zed: eid=93 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpatheg
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19ade18>] lustre_fill_super+0x468/0x960 [obdclass]
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac651a5f>] mount_nodev+0x4f/0xb0
Nov 29 18:37:56 c8oss01 zed: eid=94 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathbj
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19a5db8>] lustre_mount+0x38/0x60 [obdclass]
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac6525de>] mount_fs+0x3e/0x1b0
Nov 29 18:37:56 c8oss01 zed: eid=95 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathag
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac671297>] vfs_kern_mount+0x67/0x110
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac6739cf>] do_mount+0x1ef/0xd00
Nov 29 18:37:56 c8oss01 zed: eid=96 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathaf
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac674823>] SyS_mount+0x83/0xd0
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffacb93f92>] system_call_fastpath+0x25/0x2a
Nov 29 18:37:56 c8oss01 zed: eid=97 class=checksum pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathep
Nov 29 18:37:56 c8oss01 kernel: [<ffffffffffffffff>] 0xffffffffffffffff

We suspect corruption on the OST caused by a stonith event, but could be 
wrong.  Any tips in how to manually solve would be great...

Thanks,
Alastair.

On Mon, 29 Nov 2021, Alastair Basden wrote:

> [EXTERNAL EMAIL]
>
> Hi all,
>
> Upon attempting to mount a zfs OST, we are getting:
> Message from syslogd at c8oss01 at Nov 29 18:11:47 ...
> kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini())
> ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
>
> Message from syslogd at c8oss01 at Nov 29 18:11:47 ...
> kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini()) LBUG
>
>
> Followed by a system freeze.
>
> Has anyone else seen this?  Any ideas?
>
> Thanks,
> Alastair.
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


More information about the lustre-discuss mailing list