[lustre-discuss] LustreError on ZFS volumes

Jesse Stroik jesse.stroik at ssec.wisc.edu
Mon Dec 12 13:39:03 PST 2016

Thanks for taking the time to respond, Tom,

> For clarification, it sounds like you are using hardware based RAID-6, and not ZFS raid? Is this correct? Or was the faulty card simply an HBA?

You are correct. This particular file system is still using hardware RAID6.

> At the bottom of the ‘zpool status -v pool_name’ output, you may see paths and/or zfs object ID’s of the damaged/impacted files. This would be good to take note of.

Yes, I output this to files at a few different times and we've had no 
chance since replacing the RAID controller, which makes me feel 
reasonably comfortable leaving the file system in production.

There are 370 objects listed by zpool status -v but I am unable to 
access at least 400 files. Almost all of our files are single stripe.

> Running a ‘zpool scrub’ is a good idea. If the zpool is protected with "ZFS raid", the scrub may be able to repair some of the damage. If the zpool is not protected with "ZFS raid", the scrub will identify any other errors, but likely NOT repair any of the damage.

We're not protected with ZFS RAID, just hardware raid6. I could run a 
patrol on the hardware controller and then a ZFS scrub if that makes the 
most sense at this point. This file system is scheduled to run a scrub 
the third week of every month so it would run one this weekend otherwise.

> If you have enough disk space on hardware that is behaving properly (and free space in the source zpool), you may want to replicate the VDEV’s (OST) that are reporting errors. Having a replicated VDEV can afford you the ability to examine the data without fear of further damage. You may also want to extract certain files from the replicated VDEV(s) which are producing IO errors on the source VDEV.
> Something like this for replication should work:
> zfs snap source_pool/source_ost at timestamp_label
> zfs send -Rv source_pool/source_ost at timestamp_label | zfs receive destination_pool/source_oat_replicated
> You will need to set zfs_send_corrupt_data to 1 in /sys/module/zfs/parameters or the ‘zfs send’ will error and fail when sending a VDEV with read and/or checksum errors.
> Enabling zfs_send_corrupt_data allows the zfs send operation to complete. Any blocks that are damaged on the source side, will have “x2f5baddb10c” replaced in the bad blocks on the destination side. This can be helpful in troubleshooting if an entire file is corrupt, or parts of the file.
> After the replication, you should set the replicated VDEV to read only with ‘zfs set readonly=on destination_pool/source_ost_replicated’

Thank you for this suggestion. We'll most likely do that.

Jesse Stroik

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3964 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20161212/e727ee71/attachment.bin>

More information about the lustre-discuss mailing list