[Lustre-discuss] Poor Direct-IO Performance with Lustre-2.1.5

Fri Jun 21 02:49:02 PDT 2013

Hi,

I am experiencing poor direct-IO performance using Lustre 2.1.5 (latest stable) on CentOS 6.3.

Two OSS servers connect to the same MD3200 (daisy chained by 4 MD1200).
5 disks (from each MD) form a RAID-5 virtual disk as an OST.
8 OSTs are created in the file system.

RAID segment size is 256K, stripe size is 1MB.

4 clients connect to the OSS servers by 10GigEthernet.
Network performace between servers and clients is normal. 1GB/s throughput is obtained in netperf and lnet self test.

4 clients are running iozone to write 4G files.

prompt$  RSH=ssh iozone -i 0 -I -M -C -w -r 1m -t 4 -s 4g -+m /root/iozone_clients
....
        O_DIRECT feature enabled

        Machine = Linux cluster.iseis.cuhk.edu.hk 2.6.32-279.14.1.el6.x86_64 #1 SMP Tu  Setting no_unlink
        Setting no_unlink
        Record Size 1024 KB
        File size set to 4194304 KB
        Network distribution mode enabled.
        Command line used: iozone -i 0 -I -M -C -w -r 1m -t 4 -s 4g -+m /root/iozone_clients
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 4 processes
        Each process writes a 4194304 Kbyte file in 1024 Kbyte records

        Test running:
        Children see throughput for  4 initial writers  =  238195.08 KB/sec
        Min throughput per process                      =   58905.94 KB/sec
        Max throughput per process                      =   60983.77 KB/sec
        Avg throughput per process                      =   59548.77 KB/sec
        Min xfer                                        = 4051968.00 KB
        Child[0] xfer count = 4194304.00 KB, Throughput =   60983.77 KB/sec
        Child[1] xfer count = 4066304.00 KB, Throughput =   59111.19 KB/sec
        Child[2] xfer count = 4071424.00 KB, Throughput =   59194.18 KB/sec
        Child[3] xfer count = 4051968.00 KB, Throughput =   58905.94 KB/sec

Aggregate throughout 238 MB/s is obtained. 

There is only about 30MB/s throughput (238 / 8) for each OST. (seen at Dell Storage Maanger Performance Monitor)
I think it is considered poor, as one OST has 4 effective disks within a RAID-5 volume.

Why the direct io performance is so slow? Thanks in advance.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130621/d17baf84/attachment.htm>