[Lustre-discuss] ZFS question: HW raid5 vs raidz?

Andreas Dilger adilger at whamcloud.com
Fri Jun 10 13:22:36 PDT 2011


On 2011-06-10, at 10:58 AM, David Noriega wrote:
> I was checking out zfsonlinux.org to see how things have been going
> lately and I had a question. Whats the difference, or whats better:
> Use a hardware raid5(or 6) or use zfs to create a raidz pool? In terms
> of Lustre, is one preferred over another?

ZFS much prefers to have direct access to the individual disks in a JBOD,
instead of via h/w RAID-5/6.  There are several reasons:

- it "knows" where the data and parity are located, and if there is an
  error reading data from disk it can retry with different data/parity
  combinations until the checksum matches, even trying single-bit error
  recovery in extreme cases
- it is easier to locate multiple copies of the metadata on different
  disks and if it has direct access to the individual disks
- it has more IO queues and can schedule IO better for individual disks,
  keeping the IO queue relatively shallow so that read latency isn't hurt
- pooled storage, in theory, allows all space/bandwidth to be used by any
  thread doing IO.  In practice this doesn't perform as well as in theory.
- no read-modify-write when writing "partial block" data (there isn't really
  such a thing as a "partial block write" for RAID-Z"

The main drawback is that RAID-Z needs a lot more effort when rebuilding
a failed disk compared to a normal RAID-5/6.  ZFS proponents will claim
that "it only needs to rebuild the used parts of the filesystem", but
most HPC filesystems are kept 70-80% full, so the RAID-Z overhead wipes
out any advantage gained by not rebuilding the 20% of unused space.

See zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf for some
performance comparisons.


Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.






More information about the lustre-discuss mailing list