[Lustre-discuss] mkfs options/tuning for RAID based OSTs

Andreas Dilger andreas.dilger at oracle.com
Tue Oct 19 16:16:11 PDT 2010


On 2010-10-19, at 14:42, Edward Walter wrote:
> We're doing a fresh Lustre 1.8.4 install using Sun StorageTek 2540 
> arrays for our OST targets.  We've configured these as RAID6 with no 
> spares which means we have the equivalent of 10 data disks and 2 parity 
> disks in play on each OST.

As Paul mentioned, using something other than 8 data + N parity is bad for performance.  It is doubly bad if the stripe width (ndata * segment size) is > 1MB in size, because that means EVERY WRITE will be a read-modify-write, and kill performance.

> Also, does anyone have recommendations for "aligning" the filesystem so 
> that the fs blocks align with the RAID chunks?  We've done things like 
> this for SSD drives.  We'd normally give Lustre the entire RAID device 
> (without partitions) so this hasn't been an issue in the past.  For this 
> installation though; we're creating multiple volumes (for size/space 
> reasons) so partitioning is a necessary evil now.

Partitioning is doubly evil (unless done extremely carefully) because it will further mis-align the IO (due to the partition table and crazy MS-DOS odd sector alignment) so that you will always partially modify extra blocks at the beginning/end of each of each write (possibly causing data corruption in case of incomplete writes/cache loss/etc).

If you stick with 8 data disks, and assuming 2TB drives or smaller, with 1.8.4 you can use the ext4-based ldiskfs (in a separate ldiskfs RPM on the download site) to format up to 16TB LUNs for a single OST.  That is really the best configuration, and will probably double your write performance.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list