[lustre-discuss] 2.12.6 freeze

Wed Dec 1 01:09:06 PST 2021

Hi,

Turns out there is a problem with the zpool, which we think got corrupted 
by a stonith event when a disk on another pool started to do a predicted 
failure.

A zpool scrub has been done, and there are 5 files with permanent errors 
(zpool status -v):
errors: Permanent errors have been detected in the following files:

         cos8-ost6/ost6:<0xe>
         cos8-ost6/ost6:<0x1a>
         cos8-ost6/ost6:<0x1c>
         cos8-ost6/ost6:/
         cos8-ost6/ost6:<0x193>

The fact that / is corrupted seems to worry me!
If we set the canmount=on property and mount the zpool, then an ls of the 
mount point gives an Input/output error.

Does anyone have experience with how to repair this?

There is no hardware problem, all 12 disks within this z2 pool are fine - 
we think the stonith must have caused it - though I thought zfs was 
supposed to be immune to that!

Thanks...

On Tue, 30 Nov 2021, Tommi Tervo wrote:

> [EXTERNAL EMAIL]
>
>> Upon attempting to mount a zfs OST, we are getting:
>> Message from syslogd at c8oss01 at Nov 29 18:11:47 ...
>>  kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini())
>> ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
>>
>> Message from syslogd at c8oss01 at Nov 29 18:11:47 ...
>>  kernel:LustreError: 58223:0:(lu_object.c:1267:lu_device_fini()) LBUG
>
> Hi,
>
> Looks like LU-12675, time to upgrade 2.12.7?
>
> https://jira.whamcloud.com/browse/LU-12675
>
> HTH,
> -Tommi
>