[lustre-discuss] Stripe size for osts

Dilger, Andreas andreas.dilger at intel.com
Mon Mar 21 18:23:32 PDT 2016

On Mar 21, 2016, at 15:50, Pawel Dziekonski <dzieko at wcss.pl> wrote:
>> On pon, 21 mar 2016 at 09:24:02 +0000, Dilger, Andreas wrote:
>>> On 2016/03/18, 12:52, "Kurt Strosahl" <strosahl at jlab.org> wrote:
>>> Good Afternoon,
>>>   I'm experimenting with ost configurations geared more towards small
>>> files and operations on those small files (like source code, and
>>> compiling), and I was wondering about changing the stripe size so that
>>> small files fit more efficiently on an ost.  I believe that would be the
>>> --param lov.stripesize=XX option for mkfs.lustre, is that correct?  And
>>> is there a lower limit that I should know about?
>> Just to clarify, the stripe size for Lustre is not a property of the OST,
>> but rather a property of each file.  The OST itself allocates space
>> internally as it sees fit.  For ldiskfs space allocation is done in units
>> of 4KB blocks managed in extents, while ZFS has variable block sizes (512
>> bytes up to 1MB or more, but only one block size per file) managed in a
>> tree.  In both cases, if a file is sparse then no blocks are allocated for
>> the holes in the file.
>> As for the minimum stripe size, this is 64KB, since it isn't possible to
>> have a stripe size below the PAGE_SIZE on the client, and some
>> architectures (e.g. IA64, PowerPC, Alpha) allowed 64KB PAGE_SIZE.
>> For small files, the stripe_size parameter is virtually meaningless, since
>> the data will never exceed a single stripe in size.  What is much more
>> important is to use a stripe_count=1, so that the client doesn't have to
>> query multiple OSTs to determine the file size, timestamps, and other
>> attributes.
> Andreas,
> default stripe size is 1MB. Is there a reason for that?
> P

Yes, because the underlying RAID hardware is usually configured with RAID-6 8+2 1MB stripe width, so 1MB  RPCs writes avoid read-modify-write, and 1MB reads ensure that the reads align properly with the allocation size that was used by the filesystem when the data was written. 

Cheers, Andreas

More information about the lustre-discuss mailing list