[Lustre-discuss] Small files

Dilger, Andreas andreas.dilger at intel.com
Wed Jul 3 13:38:47 PDT 2013


On 2013/03/07 5:51 AM, "Nikolay Kvetsinski" <nkvecinski at gmail.com> wrote:

>Thanks mate, what you say makes sense. Anyway I`m dealing with different
>file sizes, varying from KBs to GBs. Luckily some of the files are
>grouped in folders, for example in folder /fs/X all files will be 9MB,
>which I`ll not stripe. In /fs/Y
> all files will be GBs, which I`ll stripe among all OSTs. I guess the
>problem now is to figure out the optimal stripe size/stripe count for the
>files in the range from a couple of hundreds of MBs to GB .... on top of
>that, what I said about files with similar
> size being put in their own directory might not always be true .....
>Unfortunately I cant "teach"  users to think before they generate their
>files.

Note that there is also no benefit to stripe files over multiple OSTs if
there is already parallelism at the application level (i.e. multiple
threads reading/writing separate files in parallel from one or more client
nodes).  One thread per CPU can saturate the network interface of the
client, and if this IO goes to multiple OSTs per file then it creates
unnecessary contention and overhead (more locking, RPCs, etc to manage
multiple objects per file).

You should look at the concurrency of the file access instead of just the
file size to decide what to stripe.  For really large files (e.g. hundreds
of GB+, or anything over 5% of the total OST size or so) you should
probably stripe those over multiple OSTs anyway, just to balance the space
usage, and the extra metadata overhead isn't noticeable at this size
anyway.

Cheers, Andreas

>On Wed, Jul 3, 2013 at 12:18 PM, Kevin Van Maren
><KVanMaren at fusionio.com> wrote:
>
>With a 1MB default stripe size, anything that is <1 MB is a "small" file.
> In general, I would not stripe 10MB files across more than one OST
>(each).
>
>Kevin
>
>
>On Jul 3, 2013, at 3:11 AM, Nikolay Kvetsinski wrote:
>
>> Hello again guys,
>>
>> I constantly read about that Lustre is not very good with "small
>>files". However, there is no definition of small file in Lustre point of
>>view. Would you be able to draw some borders on what is considered to be
>>a small file ,  for example is a 10 MB considered
> a small file, and should a directory holding such files be striped on
>few OSTs, and should it be striped at all.
>


Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division





More information about the lustre-discuss mailing list