[Lustre-discuss] Lustre MPI-IO performance on CNL

Wed Mar 5 10:39:39 PST 2008

Hi,
Weikuan Yu wrote:
> Hi,
>
> The I/O performance of CNL (as measured with IOR) seems quite different
> for a shared file, compared to the same with separated files.
>
> Here are some numbers on a smaller file system on XT system at ORNL. All 
> files are striped to 72OSTs. I deliberately use a block size 8512m.
>
> 1. sample tests with separate files
> # aprun -n 32 -N 1 ~/benchmarks/IOR-2.9.1/src/C/IOR -a MPIIO -b 8512m -t 
> 64m -d 1 -i 2 -w -r -g -F -o iortes
> Max Write: 9978.18 MiB/sec (10462.88 MB/sec)
> Max Read:  5612.78 MiB/sec (5885.43 MB/sec)
>
> 2. sample share file performance
> # aprun -n 32 -N 1 ~/benchmarks/IOR-2.9.1/src/C/IOR -a MPIIO -b 8512m -t 
> 64m -d 1 -i 2 -w -r -g -o iortes
> Max Write: 6817.31 MiB/sec (7148.47 MB/sec)
> Max Read:  5591.98 MiB/sec (5863.62 MB/sec)
>
> In addition, using my experimental MPI-IO library, I noticed that 
> enabling direct I/O can have various effects for I/O on CNL.
>   
What is the stripe_size of this test? 4M? If it is 4M, then 
transfer_size is a little
bigger(64M). And we have seen this situation before, finally it seems 
because client hold
too much lock in each write(because of lustre down-forward extent lock 
policy) which might
block other client writing, so impact the parallel of the whole system. 
Maybe you could try
decrease transfer size to stripe_size. Or increase stripe_size to 64M 
and see how is it?

Thanks
WangDi
> 3. sample seprate files with direct I/O
> export MPIO_DIRECT_WRITE=true; export MPIO_DIRECT_READ=true; aprun -n 32 
> -N 1 ~/benchmarks/IOR-2.10.1/src/C/IOR -a MPIIO -b 8512m -t 64m -d 1 -i 
> 2 -w -r -g -F -k -o lustre:iortest
> Max Write: 9353.66 MiB/sec (9808.03 MB/sec)
> Max Read:  8269.28 MiB/sec (8670.97 MB/sec)
>
> 4. sample share file performance with direct IO
> # export MPIO_DIRECT_WRITE=true; export MPIO_DIRECT_READ=true; aprun -n 
> 32 -N 1 ~/benchmarks/IOR-2.10.1/src/C/IOR -a MPIIO -b 8512m -t 64m -d 1 
> -i 2 -w -r -g -k -o lustre:iortes
> Max Write: 9484.11 MiB/sec (9944.81 MB/sec)
> Max Read:  7929.63 MiB/sec (8314.81 MB/sec)
>
> It seems direct I/O helps quite a bit on the performance of parallel 
> reads, but not on writes. The shared file mode appears to benefit more 
> from direct write.
>
> While it is understandable that the client cache can play a big role 
> here,  I am not sure how it could help the share-file mode much better. 
> Anybody can help with some explanations on the comparison between reads 
> and writes and the same for shared-file and separated-files?
>
> Also let me know if I am not clear in my descriptions.
>
>