[Lustre-discuss] IOR writing to a shared file, performance does not scale

Fri Feb 10 14:31:25 PST 2012

Did you set up the shared file for striping?

> hype356{rhedges}395: lfs help setstripe
> setstripe: Create a new file with a specific striping pattern or
> set the default striping pattern on an existing directory or
> delete the default striping pattern from an existing directory
> usage: setstripe [--size|-s stripe_size] [--count|-c stripe_count]
>                  [--index|-i|--offset|-o start_ost_index]
>                  [--pool|-p <pool>] <directory|filename>
>        or 
>        setstripe -d <directory>   (to delete default striping)
>     stripe_size:  Number of bytes on each OST (0 filesystem default)
>                   Can be specified with k, m or g (in KB, MB and GB
>                   respectively)
>     start_ost_index: OST index of first stripe (-1 default)
>     stripe_count: Number of OSTs to stripe over (0 default, -1 all)
>     pool:         Name of OST pool to use (default none)

On 2/10/12 2:27 PM, "Kshitij Mehta" <kmehta at cs.uh.edu> wrote:

> We have lustre 1.6.7 configured using 64 OSTs.
> I am testing the performance using IOR, which is a file system benchmark.
> 
> When I run IOR using mpi such that processes write to a shared file,
> performance does not scale. I tested with 1,2 and 4 processes, and the
> performance remains constant at 230 MBps.
> 
> When processes write to separate files, performance improves greatly,
> reaching 475 MBps.
> 
> Note that all processes are spawned on a single node.
> 
> Here is the output:
> Writing to a shared file:
> 
>> Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
>> /fastfs/gabriel/ss_64/km_ior.out
>> Machine: Linux deimos102
>> 
>> Summary:
>>         api                = POSIX
>>         test filename      = /fastfs/gabriel/ss_64/km_ior.out
>>         access             = single-shared-file
>>         ordering in a file = sequential offsets
>>         ordering inter file= no tasks offsets
>>         clients            = 4 (4 per node)
>>         repetitions        = 1
>>         xfersize           = 32 MiB
>>         blocksize          = 2 GiB
>>         aggregate filesize = 8 GiB
>> 
>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
>> (OPs)  Mean (OPs)   Std Dev  Mean (s)
>> ---------  ---------  ---------  ----------   -------  ---------
>> ---------  ----------   -------  --------
>> write         233.61     233.61      233.61      0.00       7.30
>> 7.30        7.30      0.00  35.06771   EXCEL
>> 
>> Max Write: 233.61 MiB/sec (244.95 MB/sec)
> 
> Writing to separate files:
> 
>> Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
>> /fastfs/gabriel/ss_64/km_ior.out -F
>> Machine: Linux deimos102
>> 
>> Summary:
>>         api                = POSIX
>>         test filename      = /fastfs/gabriel/ss_64/km_ior.out
>>         access             = file-per-process
>>         ordering in a file = sequential offsets
>>         ordering inter file= no tasks offsets
>>         clients            = 4 (4 per node)
>>         repetitions        = 1
>>         xfersize           = 32 MiB
>>         blocksize          = 2 GiB
>>         aggregate filesize = 8 GiB
>> 
>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
>> (OPs)  Mean (OPs)   Std Dev  Mean (s)
>> ---------  ---------  ---------  ----------   -------  ---------
>> ---------  ----------   -------  --------
>> write         475.95     475.95      475.95      0.00      14.87
>> 14.87       14.87      0.00  17.21191   EXCEL
>> 
>> Max Write: 475.95 MiB/sec (499.07 MB/sec)
> 
> I am trying to understand where the bottleneck is, when processes write
> to a shared file.
> Your help is appreciated.

====================================================

Richard Hedges
Customer Support and Test - File Systems Project
Development Environment Group - Livermore Computing
Lawrence Livermore National Laboratory
7000 East Avenue, MS L-557
Livermore, CA    94551

v:    (925) 423-2699
f:    (925) 423-6961
E:    richard-hedges at llnl.gov