[lustre-discuss] lustre/ZFS resilver order question

Tue Sep 27 03:34:16 PDT 2022

Hi,

This is possibly more a ZFS rather than lustre question - but it involves ZFS 0.7.13 which I think is now mostly used with lustre.

One of our ZFS (raid Z2) osts has 2 faulty disks:

[root at aoss01a ~]# zpool status aliceost02
  pool: aliceost02
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep 27 11:10:44 2022
        2.86G scanned out of 71.0T at 29.9M/s, 691h54m to go
        288M resilvered, 0.00% done
config:

        NAME                    STATE     READ WRITE CKSUM
        aliceost02              DEGRADED     0     0     0
          raidz2-0              DEGRADED     0     0     0
            A11-aoss01j1-021    ONLINE	   0     0     0
            spare-1             DEGRADED     0     0     0
              A11-aoss01j1-022  OFFLINE     34   633     0
              A11-aoss01j2-084  ONLINE	   0     0     0  (resilvering)
            A11-aoss01j1-023    ONLINE	   0     0     0
            A11-aoss01j1-024    ONLINE	   0     0     0
            A11-aoss01j1-025    ONLINE	   0     0     0
            A11-aoss01j1-026    ONLINE     157   546     0  (resilvering)
            A11-aoss01j1-027    ONLINE	   0     0     0
            A11-aoss01j1-028    ONLINE	   0     0     0
            A11-aoss01j1-029    ONLINE	   0     0     0
            A11-aoss01j1-030    ONLINE	   0     0     0
        spares
          A11-aoss01j2-084      INUSE     currently in use

The device A11-aoss01j1-022 failed first, a spare (A11-aoss01j2-084) was assigned and a resilver started. Since then a second disk (A11-aoss01j1-026) has failed. Both disks are shown as resilvering, am I right in thinking that the second disk resilver will have
+completely interupted the first resilver (this is what the resilver time implies) and the resilver of A11-aoss01j1-022 will not restart until the resilver of A11-aoss01j1-026 is complete?

My reading of the ZoL docs is that versions of ZFS after 0.8 defer new resilvers until the older one has completed, but older versions interupt the running resilver?

If my understanding of this is correct then A11-aoss01j1-026 needs to be offlined and replaced as well and we just have to hope we don't get a 3rd disk failure in the ~70 hours required to resilver at least one of the offline disks!

Any advice or ZFS knowledge appreciated.

(I've disabled stripe creation on the OST to try to reduce I/O on the pool and set the zfs_resilver_delay to 0 to prioritize the resilver, anything else we could do to speed up the resilver?)

Kind Regards,
Christopher.