[Lustre-discuss] Poor Direct-IO Performance with Lustre-2.1.5
Chan Ching Yu, Patrick
cychan at clustertech.com
Fri Jun 21 02:49:02 PDT 2013
I am experiencing poor direct-IO performance using Lustre 2.1.5 (latest stable) on CentOS 6.3.
Two OSS servers connect to the same MD3200 (daisy chained by 4 MD1200).
5 disks (from each MD) form a RAID-5 virtual disk as an OST.
8 OSTs are created in the file system.
RAID segment size is 256K, stripe size is 1MB.
4 clients connect to the OSS servers by 10GigEthernet.
Network performace between servers and clients is normal. 1GB/s throughput is obtained in netperf and lnet self test.
4 clients are running iozone to write 4G files.
prompt$ RSH=ssh iozone -i 0 -I -M -C -w -r 1m -t 4 -s 4g -+m /root/iozone_clients
O_DIRECT feature enabled
Machine = Linux cluster.iseis.cuhk.edu.hk 2.6.32-279.14.1.el6.x86_64 #1 SMP Tu Setting no_unlink
Record Size 1024 KB
File size set to 4194304 KB
Network distribution mode enabled.
Command line used: iozone -i 0 -I -M -C -w -r 1m -t 4 -s 4g -+m /root/iozone_clients
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 4 processes
Each process writes a 4194304 Kbyte file in 1024 Kbyte records
Children see throughput for 4 initial writers = 238195.08 KB/sec
Min throughput per process = 58905.94 KB/sec
Max throughput per process = 60983.77 KB/sec
Avg throughput per process = 59548.77 KB/sec
Min xfer = 4051968.00 KB
Child xfer count = 4194304.00 KB, Throughput = 60983.77 KB/sec
Child xfer count = 4066304.00 KB, Throughput = 59111.19 KB/sec
Child xfer count = 4071424.00 KB, Throughput = 59194.18 KB/sec
Child xfer count = 4051968.00 KB, Throughput = 58905.94 KB/sec
Aggregate throughout 238 MB/s is obtained.
There is only about 30MB/s throughput (238 / 8) for each OST. (seen at Dell Storage Maanger Performance Monitor)
I think it is considered poor, as one OST has 4 effective disks within a RAID-5 volume.
Why the direct io performance is so slow? Thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-discuss