[lustre-discuss] ZFS MDT Corruption

Christian Kuntz c.kuntz at opendrives.com
Fri Sep 16 13:25:28 PDT 2022


Oof! That's not a good situation to be in. Unfortunately, I've hit the dual
import situation before as well, and as far as I know once you have two
nodes import a pool at the same time you're more or less hosed.

When it happened to me, I tried using zdb to read all the recent TXGs to
try to back track the pool to a previously working state, but unfortunately
none of it worked, I think I tried 30 in all. You could try that route,
maybe you'll be luckier than I.

Now might be the time to dust off any remote backups you have or reach out
to ZFS recovery specialists. Additionally, _always_ enable `zpool set
multihost=on <poolname>` for any pool that can be imported by more than one
node for this reason. You can ignore hostid checking safely with `zpool
import -f`, but without multihost set to on you have no protection against
simultaneous imports.

For rollback, look into the `-X` and `-T` pool import options. The man page
for `zdb` should be able to answer most of your questions. Otherwise, a
common actor in the ZFS recovery scene is https://www.ufsexplorer.com/ (or
at least as far as I've seen).

Sorry for the bad news :(
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220916/aa002ba5/attachment.htm>


More information about the lustre-discuss mailing list