[lustre-discuss] Error on a zpool underlying an OST

Mon Jul 11 21:02:12 PDT 2016

Hi,

Can anyone advise how to clean up 1000s of zfs level permanent errors 
and the lustre level too?

A similar question was presented on the list but I did not see an answer.
https://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg12454.html

As I was testing new hardware I discovered an LSI HBA was bad.  On a 
single combined MDS/OSS there were 8 OSTs split across 2 jbod and 2 LSI 
HBA.  The mdt was on a 3rd jbod downlinked on the jbod connected with 
the bad controller.  The zpools connected to the good HBA were scrubed 
clean after unmounting and stopping lustre.  The zpools on the bad 
controller continued to have errors while connected to the bad 
controller.  One of these OSTs reported a disk failure during the scrub 
and began resilvering yet autoreplace was off.    This is a very bad 
event considering the card was causing all of the errors.  Neither a 
scrub or resilver would ever complete.  I stopped the scrub on the 3 
other osts and detached the spare from the ost in resilver process.  
After narrowing down the bad HBA (initially it was not clear if cables 
or jbod backplanes were bad), I use the good HBA to scrub the jbod 1 
again, then shutdown disconnected the jbod1.  Then proceeded to connect 
the jbod2 to the good controller to scrub the jbod 2 zpools which had 
previously been attached to the bad LSI controller.  The 3 zpools which 
had scrub stopped previously did complete successfully.  The one which 
had begun resilvering began again to resilver after I initiated a 
replace of the failed disk with the spare.  The resilver completed but 
many permanent errors were discovered on the zpool.  Since this is a 
test pool I was interested to know if zfs would recover.  In a real 
scenario with HW problems I'll shutdown and disconnect the data drives 
prior to HW testing.

The status listed below shows a new scrub in process after the resilver 
completed.  The cache drive is missing because the 3rd jbod is 
disconnected temporarily.

===================================

ZFS:   v0.6.5.7-1
lustre 2.8.55
kernel 2.6.32_642.1.1.el6.x86_64.x86_64
Centos 6.8

===================================
   ~]# zpool status -v test-ost4
   pool: test-ost4
  state: ONLINE
status: One or more devices has experienced an error resulting in data
     corruption.  Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
     entire pool from backup.
    see: http://zfsonlinux.org/msg/ZFS-8000-8A
   scan: scrub in progress since Mon Jul 11 22:29:09 2016
     689G scanned out of 12.4T at 711M/s, 4h49m to go
     40K repaired, 5.41% done
config:

     NAME                                       STATE READ WRITE CKSUM
     test-ost4                                  ONLINE 0     0   180
       raidz2-0                                 ONLINE 0     0   360
         ata-ST4000NM0033-9ZM170_Z1Z7GYXY       ONLINE 0     0     2  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7KKPQ       ONLINE 0     0     3  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7L5E7       ONLINE 0     0     3  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7KGQT       ONLINE 0     0     0  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7LA8K       ONLINE 0     0     4  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7KB0X       ONLINE 0     0     3  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7JSMN       ONLINE 0     0     2  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7KXRA       ONLINE 0     0     2  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7MLSN       ONLINE 0     0     2  
(repairing)
         ata-ST4000NM0033-9ZM170_Z1Z7L4DT       ONLINE 0     0     7  
(repairing)
     cache
       ata-D2CSTK251M20-0240_A19CV011227000092  UNAVAIL 0     0     0

errors: Permanent errors have been detected in the following files:

         test-ost4/test-ost4:<0xe00>
         test-ost4/test-ost4:<0xe01>
         test-ost4/test-ost4:<0xe02>
         test-ost4/test-ost4:<0xe03>
         test-ost4/test-ost4:<0xe04>
         test-ost4/test-ost4:<0xe05>
         test-ost4/test-ost4:<0xe06>.......
     .......
     .......continues......
     .......
     .......
         test-ost4/test-ost4:<0xdfe>
         test-ost4/test-ost4:<0xdff>
===================================

Follow up questions,

Is is better to not have a spare attached to the pool to prevent 
resilvering in this scenario?  (bad HBA, disk failed during scrub, 
resilver began, yet auto relplace was off.  The spare was assigned to 
the zpool.)

In a dual path to the jbod would the bad HBA card be disabled 
automatically to prevent IO errors reaching the disk?  The current setup 
is single path only.

Thank you for any notes in advance,
Kevin

-- 
Kevin Abbey
Systems Administrator
Center for Computational and Integrative Biology (CCIB)
http://ccib.camden.rutgers.edu/

Rutgers University - Science Building
315 Penn St.
Camden, NJ 08102
Telephone: (856) 225-6770
Fax:(856) 225-6312
Email: kevin.abbey at rutgers.edu