[Lustre-discuss] IOR writing to a shared file, performance does not scale

Michael Kluge Michael.Kluge at tu-dresden.de
Fri Feb 10 23:29:03 PST 2012


Hi Kshitij,

I would recommend to run sgpdd-survey on the servers for one and for 
multiple disks and then obdfilter-survey. Then you know what your 
storage can deliver. Then you could do lnet tests as well to see wether 
the network works fine. If the disks and the network deliver the 
expected performance, IOR will most probably run with good performance 
as well.

Please see:
http://wiki.lustre.org/images/4/40/Wednesday_shpc-2009-benchmarking.pdf


Regards, Michael

On 10.02.2012 23:27, Kshitij Mehta wrote:
> We have lustre 1.6.7 configured using 64 OSTs.
> I am testing the performance using IOR, which is a file system benchmark.
>
> When I run IOR using mpi such that processes write to a shared file,
> performance does not scale. I tested with 1,2 and 4 processes, and the
> performance remains constant at 230 MBps.
>
> When processes write to separate files, performance improves greatly,
> reaching 475 MBps.
>
> Note that all processes are spawned on a single node.
>
> Here is the output:
> Writing to a shared file:
>
>> Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
>> /fastfs/gabriel/ss_64/km_ior.out
>> Machine: Linux deimos102
>>
>> Summary:
>>          api                = POSIX
>>          test filename      = /fastfs/gabriel/ss_64/km_ior.out
>>          access             = single-shared-file
>>          ordering in a file = sequential offsets
>>          ordering inter file= no tasks offsets
>>          clients            = 4 (4 per node)
>>          repetitions        = 1
>>          xfersize           = 32 MiB
>>          blocksize          = 2 GiB
>>          aggregate filesize = 8 GiB
>>
>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
>> (OPs)  Mean (OPs)   Std Dev  Mean (s)
>> ---------  ---------  ---------  ----------   -------  ---------
>> ---------  ----------   -------  --------
>> write         233.61     233.61      233.61      0.00       7.30
>> 7.30        7.30      0.00  35.06771   EXCEL
>>
>> Max Write: 233.61 MiB/sec (244.95 MB/sec)
>
> Writing to separate files:
>
>> Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
>> /fastfs/gabriel/ss_64/km_ior.out -F
>> Machine: Linux deimos102
>>
>> Summary:
>>          api                = POSIX
>>          test filename      = /fastfs/gabriel/ss_64/km_ior.out
>>          access             = file-per-process
>>          ordering in a file = sequential offsets
>>          ordering inter file= no tasks offsets
>>          clients            = 4 (4 per node)
>>          repetitions        = 1
>>          xfersize           = 32 MiB
>>          blocksize          = 2 GiB
>>          aggregate filesize = 8 GiB
>>
>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
>> (OPs)  Mean (OPs)   Std Dev  Mean (s)
>> ---------  ---------  ---------  ----------   -------  ---------
>> ---------  ----------   -------  --------
>> write         475.95     475.95      475.95      0.00      14.87
>> 14.87       14.87      0.00  17.21191   EXCEL
>>
>> Max Write: 475.95 MiB/sec (499.07 MB/sec)
>
> I am trying to understand where the bottleneck is, when processes write
> to a shared file.
> Your help is appreciated.
>

-- 
Dr.-Ing. Michael Kluge

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih



More information about the lustre-discuss mailing list