[Lustre-discuss] [Lustre-community] Poor multithreaded I/O performance

Kevin Van Maren kevin.van.maren at oracle.com
Tue May 24 07:16:09 PDT 2011


[Moved to Lustre-discuss]


"However, if I spawn 8 threads such that all of them write to the same 
file (non-overlapping locations), without explicitly synchronizing the 
writes (i.e. I dont lock the file handle)"


How exactly does your multi-threaded application write the data?  Are 
you using pwrite to ensure non-overlapping regions or are they all just 
doing unlocked write() operations on the same fd to each write (each 
just transferring size/8)?  If it divides the file into N pieces, and 
each thread does pwrite on its piece, then what each OST sees are 
multiple streams at wide offsets to the same object, which could impact 
performance.

If on the other hand the file is written sequentially, where each thread 
grabs the next piece to be written (locking normally used for the 
current_offset value, so you know where each chunk is actually going), 
then you get a more sequential pattern at the OST.

If the number of threads maps to the number of OSTs (or some modulo, 
like in your case 6 OSTs per thread), and each thread "owns" the piece 
of the file that belongs to an OST (ie: for (offset = thread_num * 6MB; 
offset < size; offset += 48MB) pwrite(fd, buf, 6MB, offset); ), then 
you've eliminated the need for application locks (assuming the use of 
pwrite) and ensured each OST object is being written sequentially.

It's quite possible there is some bottleneck on the shared fd.  So 
perhaps the question is not why you aren't scaling with more threads, 
but why the single file is not able to saturate the client, or why the 
file BW is not scaling with more OSTs.  It is somewhat common for 
multiple processes (on different nodes) to write non-overlapping regions 
of the same file; does performance improve if each thread opens its own 
file descriptor?

Kevin


Wojciech Turek wrote:
> Ok so it looks like you have in total 64 OSTs and your output file is 
> striped across 48 of them. May I suggest that you limit number of 
> stripes, lets say a good number to start with would be 8 stripes and 
> also for best results use OST pools feature to arrange that each 
> stripe goes to OST owned by different OSS.
>
> regards,
>
> Wojciech
>
> On 23 May 2011 23:09, <kmehta at cs.uh.edu <mailto:kmehta at cs.uh.edu>> wrote:
>
>     Actually, 'lfs check servers' returns 64 entries as well, so I
>     presume the
>     system documentation is out of date.
>
>     Again, I am sorry the basic information had been incorrect.
>
>     - Kshitij
>
>     > Run lfs getstripe <your_output_file> and paste the output of
>     that command
>     > to
>     > the mailing list.
>     > Stripe count of 48 is not possible if you have max 11 OSTs (the
>     max stripe
>     > count will be 11)
>     > If your striping is correct, the bottleneck can be your client
>     network.
>     >
>     > regards,
>     >
>     > Wojciech
>     >
>     >
>     >
>     > On 23 May 2011 22:35, <kmehta at cs.uh.edu
>     <mailto:kmehta at cs.uh.edu>> wrote:
>     >
>     >> The stripe count is 48.
>     >>
>     >> Just fyi, this is what my application does:
>     >> A simple I/O test where threads continually write blocks of size
>     >> 64Kbytes
>     >> or 1Mbyte (decided at compile time) till a large file of say,
>     16Gbytes
>     >> is
>     >> created.
>     >>
>     >> Thanks,
>     >> Kshitij
>     >>
>     >> > What is your stripe count on the file,  if your default is 1,
>     you are
>     >> only
>     >> > writing to one of the OST's.  you can check with the lfs
>     getstripe
>     >> > command, you can set the stripe bigger, and hopefully your
>     >> wide-stripped
>     >> > file with threaded writes will be faster.
>     >> >
>     >> > Evan
>     >> >
>     >> > -----Original Message-----
>     >> > From: lustre-community-bounces at lists.lustre.org
>     <mailto:lustre-community-bounces at lists.lustre.org>
>     >> > [mailto:lustre-community-bounces at lists.lustre.org
>     <mailto:lustre-community-bounces at lists.lustre.org>] On Behalf Of
>     >> > kmehta at cs.uh.edu <mailto:kmehta at cs.uh.edu>
>     >> > Sent: Monday, May 23, 2011 2:28 PM
>     >> > To: lustre-community at lists.lustre.org
>     <mailto:lustre-community at lists.lustre.org>
>     >> > Subject: [Lustre-community] Poor multithreaded I/O performance
>     >> >
>     >> > Hello,
>     >> > I am running a multithreaded application that writes to a common
>     >> shared
>     >> > file on lustre fs, and this is what I see:
>     >> >
>     >> > If I have a single thread in my application, I get a bandwidth of
>     >> approx.
>     >> > 250 MBytes/sec. (11 OSTs, 1MByte stripe size) However, if I
>     spawn 8
>     >> > threads such that all of them write to the same file
>     (non-overlapping
>     >> > locations), without explicitly synchronizing the writes (i.e.
>     I dont
>     >> lock
>     >> > the file handle), I still get the same bandwidth.
>     >> >
>     >> > Now, instead of writing to a shared file, if these threads
>     write to
>     >> > separate files, the bandwidth obtained is approx. 700 Mbytes/sec.
>     >> >
>     >> > I would ideally like my multithreaded application to see similar
>     >> scaling.
>     >> > Any ideas why the performance is limited and any workarounds?
>     >> >
>     >> > Thank you,
>     >> > Kshitij
>     >> >
>     >> >
>     >> > _______________________________________________
>     >> > Lustre-community mailing list
>     >> > Lustre-community at lists.lustre.org
>     <mailto:Lustre-community at lists.lustre.org>
>     >> > http://lists.lustre.org/mailman/listinfo/lustre-community
>     >> >
>     >>
>     >>
>     >> _______________________________________________
>     >> Lustre-community mailing list
>     >> Lustre-community at lists.lustre.org
>     <mailto:Lustre-community at lists.lustre.org>
>     >> http://lists.lustre.org/mailman/listinfo/lustre-community
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-community mailing list
> Lustre-community at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-community
>   




More information about the lustre-discuss mailing list