[Lustre-discuss] mkfs options/tuning for RAID based OSTs

Brian J. Murrell brian.murrell at oracle.com
Wed Oct 20 05:27:10 PDT 2010


On Tue, 2010-10-19 at 21:00 -0400, Edward Walter wrote: 
> 

Ed,

> That seems to validate how I'm interpreting the parameters. We have 10 data disks and 2 parity disks per array so it looks like we need to be at 64 KB or less.

I think you have been missing everyone's point in this thread.  The
magic value is not "anything below 1MB", it's 1MB exactly.  No more, no
less (although I guess technically 256KB or 512KB would work).

The reason is that Lustre attempts to package up I/Os from the client to
the OST in 1MB chunks.  If the RAID stripe matches that 1MB then when
the OSS writes that 1MB to the OST, it's a single write to the RAID disk
underlying the OST of 1MB of data plus the parity.

Conversely, if the OSS receives 1MB of data for the OST and the RAID
stripe under the OST is not 1MB, but less, then 1MB-<raid_stripe_size>
will be written as data+parity to the first strip, but the remaining
portion of that 1MB of data from the client will be written into the
next RAID stripe only partially filling the stripe causing the RAID
layer to have to first read that whole stripe, insert the new data,
calculate a new parity and then write that whole RAID stripe back out
the disk.

So as you can see, when your RAID stripe is not exactly 1MB, the RAID
code has to do a lot more I/O, which impacts performance, obviously.

This is why the recommendations in this thread have continued to be
using a number of data disks that divides evenly into 1MB (i.e. powers
of 2: 2, 4, 8, etc.).  So for RAID6: 4+2 or 8+2, etc.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101020/5f375937/attachment.pgp>


More information about the lustre-discuss mailing list