[Lustre-discuss] mkfs options/tuning for RAID based OSTs
Andreas Dilger
andreas.dilger at oracle.com
Tue Oct 19 16:16:11 PDT 2010
On 2010-10-19, at 14:42, Edward Walter wrote:
> We're doing a fresh Lustre 1.8.4 install using Sun StorageTek 2540
> arrays for our OST targets. We've configured these as RAID6 with no
> spares which means we have the equivalent of 10 data disks and 2 parity
> disks in play on each OST.
As Paul mentioned, using something other than 8 data + N parity is bad for performance. It is doubly bad if the stripe width (ndata * segment size) is > 1MB in size, because that means EVERY WRITE will be a read-modify-write, and kill performance.
> Also, does anyone have recommendations for "aligning" the filesystem so
> that the fs blocks align with the RAID chunks? We've done things like
> this for SSD drives. We'd normally give Lustre the entire RAID device
> (without partitions) so this hasn't been an issue in the past. For this
> installation though; we're creating multiple volumes (for size/space
> reasons) so partitioning is a necessary evil now.
Partitioning is doubly evil (unless done extremely carefully) because it will further mis-align the IO (due to the partition table and crazy MS-DOS odd sector alignment) so that you will always partially modify extra blocks at the beginning/end of each of each write (possibly causing data corruption in case of incomplete writes/cache loss/etc).
If you stick with 8 data disks, and assuming 2TB drives or smaller, with 1.8.4 you can use the ext4-based ldiskfs (in a separate ldiskfs RPM on the download site) to format up to 16TB LUNs for a single OST. That is really the best configuration, and will probably double your write performance.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
More information about the lustre-discuss
mailing list