[Lustre-discuss] ZFS question: HW raid5 vs raidz?
adilger at whamcloud.com
Fri Jun 10 13:22:36 PDT 2011
On 2011-06-10, at 10:58 AM, David Noriega wrote:
> I was checking out zfsonlinux.org to see how things have been going
> lately and I had a question. Whats the difference, or whats better:
> Use a hardware raid5(or 6) or use zfs to create a raidz pool? In terms
> of Lustre, is one preferred over another?
ZFS much prefers to have direct access to the individual disks in a JBOD,
instead of via h/w RAID-5/6. There are several reasons:
- it "knows" where the data and parity are located, and if there is an
error reading data from disk it can retry with different data/parity
combinations until the checksum matches, even trying single-bit error
recovery in extreme cases
- it is easier to locate multiple copies of the metadata on different
disks and if it has direct access to the individual disks
- it has more IO queues and can schedule IO better for individual disks,
keeping the IO queue relatively shallow so that read latency isn't hurt
- pooled storage, in theory, allows all space/bandwidth to be used by any
thread doing IO. In practice this doesn't perform as well as in theory.
- no read-modify-write when writing "partial block" data (there isn't really
such a thing as a "partial block write" for RAID-Z"
The main drawback is that RAID-Z needs a lot more effort when rebuilding
a failed disk compared to a normal RAID-5/6. ZFS proponents will claim
that "it only needs to rebuild the used parts of the filesystem", but
most HPC filesystems are kept 70-80% full, so the RAID-Z overhead wipes
out any advantage gained by not rebuilding the 20% of unused space.
See zfsonlinux.org/docs/SC10_BoF_ZFS_on_Linux_for_Lustre.pdf for some
More information about the lustre-discuss