[lustre-discuss] Doubt about how a file is stored on the OSTs

Tue May 5 22:17:26 PDT 2015

On 2015/05/05, 10:48 PM, "Prakrati.Agrawal at shell.com"
<Prakrati.Agrawal at shell.com> wrote:
>Thanks for the quick reply.
>For the second question, I am taking about total number of OSTs as 165.
>So my stripe count is 165, stripe size is 1GB and total file size is 64
>GB.
>I have 64 ranks on 4 nodes.
>Hence, each is writing 1GB.
>Why does my performance degrade then? What is the extra overhead that is
>incurred?

Having a stripe count of 165 * 1GB stripes when you are only writing 64GB
to the file gives no benefit at all.  The clients will only write to the
first 64 stripes of the file, leaving 101 stripes idle, but the clients
may need to fetch locks and other information about all of the stripes in
the file (depends on what syscalls your application is using).

You would probably be better off to keep 165 stripes, but reduce the
stripe size so that the 64 ranks write to all stripes (e.g. 256MB), or
just leave it at the 1MB default and see how that goes.  With only 4
nodes, however, you may not be able to saturate all of the OSTs due to
network bandwidth limits of the clients.

Having a large number of stripes in a file is only useful if the file is
going to be very large (to distribute space usage) or accessed by many
clients concurrently (to increase total bandwidth).

Cheers, Andreas

>-----Original Message-----
>From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rmohr at utk.edu]
>Sent: Wednesday, May 06, 2015 10:03 AM
>To: Agrawal, Prakrati PTIN-PTT/ICOE
>Cc: lustre-discuss at lists.lustre.org
>Subject: Re: [lustre-discuss] Doubt about how a file is stored on the OSTs
>
>
>> On May 6, 2015, at 12:05 AM, Prakrati.Agrawal at shell.com wrote:
>> 
>> I am doing some performance benchmarking on Lustre file system. To
>>understand my results, I wanted to know how a file is written on the
>>OSTs.
>> 
>> Following is what I am doing:
>> 
>> I have a file of 64 GB to be written
>> 
>> number of ranks is 64
>> 
>> number of nodes is 4
>> 
>> stripe count is 4
>> 
>> stripe size is 1GB
>> 
>>  
>> 
>> Let the 4 OSTs be OST1, OST2, OST3, and OST4.
>> 
>> Let the nodes be N1, N2, N3 and N4.
>> 
>> So each rank is writing 1 GB to 1 of the 4 OSTs.
>> 
>> What I want to know is that, since 16 ranks are writing 1GB from say
>>N1, are all those ranks writing to OST1 only or it might be the case
>>that out ranks, some are writing to OST1, some to OST2 and so on.
>
>The way the file data will be organized is like this:
>
>1st GB -> OST1
>2nd GB -> OST2
>3rd GB -> OST3
>4th GB -> OST4
>5th GB -> OST1
>6th GB -> OST2
>Š.
>
>Depending upon which sections of the file the 16 processes on node N1 are
>writing to, they may or may not all write to the same OST.  If you are
>using SMP-style placement and assuming that rank N writes the (N+1)st GB
>of data, then each node would have 4 processes writing to each OST.
>
>> Also, if I increase my stripe count i.e number of OSTs to total number
>>of OSTs, but each rank is still writing 1GB and total ranks are 64, why
>>does performance degrade?
>
>It¹s hard to venture a guess about performance without knowing more about
>the file system (total number of OSTs, total number of servers,
>interconnect technology, etc.)  The I/O pattern of the application itself
>can also play a role.  In general though, if you had a file with
>stripe_count=64, each process should be writing to its own OST which
>should reduce contention and improve performance (assuming that there are
>not other application running concurrently that could affect I/O).
>
>--
>Rick Mohr
>Senior HPC System Administrator
>National Institute for Computational Sciences
>http://www.nics.tennessee.edu
>
>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division