[lustre-discuss] ZFS MDT Corruption

Scott Ruffner prescott.ruffner at gmail.com
Thu Sep 15 17:42:47 PDT 2022


Hi Everyone,

This is more of a ZFS than a Lustre question, but our Lustre cluster MDT HA
pair got into a split-brain condition with the ZFS zpool for the MDT. Upon
examining the situation, both HA pairs (corosync and pacemaker) had the MDT
zpool imported. A manual export from the node which was failing over
appeared initially to resolve the issue, but the 2nd node still failed to
mount the pool due to errors (despite having it imported).

Now corruption is reported on all the mirror VDEVs which make up the MDT
pool (GPT pool is fine on the same two nodes).

If I have a node up without its hostid configured, the mirror devs are
reported as healthy, but I'm unable to zfs import, even trying to override
with the -o multihost=no.

I actually suspect that the data is intact and not corrupted, but the "last
mounted" data is bad, and both systems believe the other still has it
mounted due to the metadata.

I'm stumped with getting the MDT pool re-imported on any node, but I may be
missing something.

Scott Ruffner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220915/e1fe2aea/attachment.htm>


More information about the lustre-discuss mailing list