[lustre-discuss] Error on a zpool underlying an OST

Fred Liu Fred_Liu at issi.com
Fri Mar 11 19:30:47 PST 2016

You may try recover options(rarely help) from "zpool import" but rebuilding the zpool has huge possibilities.



On Fri, Mar 11, 2016 at 5:19 PM -0800, "Bob Ball" <ball at umich.edu<mailto:ball at umich.edu>> wrote:

Hi, we have Lustre 2.7.58 in place on our OST and MDT/MGS (combined).
Underlying the lustre file system is a raid-z2 zfs pool.

A few days ago, we lost 2 disks at once from the raid-z2.  I replaced
one and a resilver started, that seemed to choke.  So, I put back both
disks with replacements, and the new re-silver shows the following now.

[root at umdist03 ~]# zpool status -v ost-007
   pool: ost-007
  state: DEGRADED
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://zfsonlinux.org/msg/ZFS-8000-8A
   scan: resilvered 972G in 9h25m with 1 errors on Fri Mar 11 19:12:37 2016

         NAME                                  STATE     READ WRITE CKSUM
         ost-007                               DEGRADED     0 0     1
           raidz2-0                            DEGRADED     0 0     4
             replacing-0                       DEGRADED     0 0     0
               18280868502819750645            UNAVAIL      0 0     0
was /dev/disk/by-path/pci-0000:0c:00.0-scsi-0:2:20:0-part1/old
               pci-0000:0c:00.0-scsi-0:2:20:0  ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:21:0    ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:22:0    ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:23:0    ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:24:0    ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:35:0    ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:36:0    ONLINE       1 0     0
             pci-0000:0c:00.0-scsi-0:2:37:0    ONLINE       0 0     0
             pci-0000:0c:00.0-scsi-0:2:38:0    ONLINE       0 0     0
             replacing-9                       UNAVAIL      0 0     0
               14369532488179106769            UNAVAIL      0 0     0
was /dev/disk/by-path/pci-0000:0c:00.0-scsi-0:2:39:0-part1/old
               pci-0000:0c:00.0-scsi-0:2:39:0  ONLINE       0 0     0

errors: Permanent errors have been detected in the following files:


what are my options here?  If I don't care about the file, can I
identify it and then just delete it?  Or is my only real option to drain
the pool and rebuild it cleanly?

Thanks for any help/advice.

lustre-discuss mailing list
lustre-discuss at lists.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160311/c8db078f/attachment.htm>

More information about the lustre-discuss mailing list