[lustre-discuss] 2.12.6 freeze

Alastair Basden a.g.basden at durham.ac.uk
Mon Nov 29 14:55:24 PST 2021


Additional info - exporting the pool, importing on another (HA) server and 
attempting to mount there also has the same problem, i.e. a kernel panic, 
and the trace shown below.

A writeconf does not help.

On Mon, 29 Nov 2021, Alastair Basden wrote:

> [EXTERNAL EMAIL]
>
> Some more information.  This is repeatable... (previously the file system
> has been fine - it's an established file system).
>
> To get this, we boot the node, and then do:
> zpool import -o cachefile=none  pool1
> zpool status shows all is well.
>
> mount -t lustre pool1/pool1 /mnt/lustre/pool1
>
> And the kernel panic.
>
>
> Some additional logs in /var/log/messages:
> Nov 29 18:37:54 c8oss01 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 128, 
> npartitions: 2
> Nov 29 18:37:54 c8oss01 kernel: alg: No test for adler32 (adler32-zlib)
> Nov 29 18:37:55 c8oss01 kernel: Lustre: Lustre: Build Version: 2.12.6
> Nov 29 18:37:55 c8oss01 kernel: LNet: 
> 40260:0:(config.c:1641:lnet_inet_enumerate()) lnet: Ignoring interface em2: 
> it's down
> Nov 29 18:37:55 c8oss01 kernel: LNet: Using FastReg for registration
> Nov 29 18:37:55 c8oss01 kernel: LNet: Added LNI 172.18.185.5 at o2ib 
> [32/512/0/100]
> Nov 29 18:37:55 c8oss01 kernel: LNet: Added LNI 172.17.185.5 at tcp 
> [8/256/0/180]
> Nov 29 18:37:55 c8oss01 kernel: LNet: Accept secure, port 988
> Nov 29 18:37:55 c8oss01 zed: eid=85 class=data pool_guid=0x07C7BF473C816BCB
> Nov 29 18:37:55 c8oss01 kernel: LustreError: 
> 40228:0:(lu_object.c:1267:lu_device_fini()) ASSERTION( 
> atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
> Nov 29 18:37:55 c8oss01 kernel: LustreError: 
> 40228:0:(lu_object.c:1267:lu_device_fini()) LBUG
> Nov 29 18:37:55 c8oss01 kernel: Pid: 40228, comm: mount.lustre 
> 3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Wed Dec 9 20:53:35 UTC 2020
> Nov 29 18:37:55 c8oss01 zed: eid=86 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathk
> Nov 29 18:37:55 c8oss01 kernel: Call Trace:
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc09687cc>] 
> libcfs_call_trace+0x8c/0xc0 [libcfs]
> Nov 29 18:37:56 c8oss01 zed: eid=87 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpatheg
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc096887c>] lbug_with_loc+0x4c/0xa0 
> [libcfs]
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19afa9b>] lu_device_fini+0xbb/0xc0 
> [obdclass]
> Nov 29 18:37:56 c8oss01 zed: eid=88 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathbj
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19b595e>] dt_device_fini+0xe/0x10 
> [obdclass]
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc0b82248>] 
> osd_device_alloc+0x278/0x3b0 [osd_zfs]
> Nov 29 18:37:56 c8oss01 zed: eid=89 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathag
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc199e9c9>] obd_setup+0x119/0x280 
> [obdclass]
> Nov 29 18:37:56 c8oss01 zed: eid=90 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathaf
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc199edd8>] class_setup+0x2a8/0x840 
> [obdclass]
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19a1e86>] 
> class_process_config+0x1726/0x2830 [obdclass]
> Nov 29 18:37:56 c8oss01 zed: eid=91 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathep
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19a63a8>] do_lcfg+0x258/0x500 
> [obdclass]
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19aabd8>] 
> lustre_start_simple+0x88/0x210 [obdclass]
> Nov 29 18:37:56 c8oss01 zed: eid=92 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathk
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19d9455>] 
> server_fill_super+0xf55/0x1890 [obdclass]
> Nov 29 18:37:56 c8oss01 zed: eid=93 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpatheg
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19ade18>] 
> lustre_fill_super+0x468/0x960 [obdclass]
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac651a5f>] mount_nodev+0x4f/0xb0
> Nov 29 18:37:56 c8oss01 zed: eid=94 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathbj
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffc19a5db8>] lustre_mount+0x38/0x60 
> [obdclass]
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac6525de>] mount_fs+0x3e/0x1b0
> Nov 29 18:37:56 c8oss01 zed: eid=95 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathag
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac671297>] 
> vfs_kern_mount+0x67/0x110
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac6739cf>] do_mount+0x1ef/0xd00
> Nov 29 18:37:56 c8oss01 zed: eid=96 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathaf
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffac674823>] SyS_mount+0x83/0xd0
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffacb93f92>] 
> system_call_fastpath+0x25/0x2a
> Nov 29 18:37:56 c8oss01 zed: eid=97 class=checksum 
> pool_guid=0x07C7BF473C816BCB vdev_path=/dev/mapper/mpathep
> Nov 29 18:37:56 c8oss01 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
>
> We suspect corruption on the OST caused by a stonith event, but could be
> wrong.  Any tips in how to manually solve would be great...
>
> Thanks,
> Alastair.
>
> On Mon, 29 Nov 2021, Alastair Basden wrote:
>
>> [EXTERNAL EMAIL]
>> 
>> Hi all,
>> 
>> Upon attempting to mount a zfs OST, we are getting:
>> Message from syslogd at c8oss01 at Nov 29 18:11:47 ...
>> kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini())
>> ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
>> 
>> Message from syslogd at c8oss01 at Nov 29 18:11:47 ...
>> kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini()) LBUG
>> 
>> 
>> Followed by a system freeze.
>> 
>> Has anyone else seen this?  Any ideas?
>> 
>> Thanks,
>> Alastair.
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


More information about the lustre-discuss mailing list