[Lustre-discuss] OST - low MB/s

Rafael David Tinoco Rafael.Tinoco at Sun.COM
Thu Sep 10 14:33:14 PDT 2009


I think Ive discovered the problem.
I was using multipathd in my "raid" devices.
Getting arround 200MB/s in raid6 with 10 disks.

Now.. testing without the multipaths:

root at a02n00:~# mdadm --detail /dev/md20
/dev/md20:
        Version : 00.90.03
  Creation Time : Thu Sep 10 18:27:28 2009
     Raid Level : raid6
     Array Size : 7814099968 (7452.11 GiB 8001.64 GB)
  Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
   Raid Devices : 10
  Total Devices : 10
Preferred Minor : 20
    Persistence : Superblock is persistent

    Update Time : Thu Sep 10 18:27:28 2009
          State : clean
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 128K

           UUID : 9cf9dd02:d53bc608:62e867a4:1df781ca
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0      66      144        0      active sync   /dev/sdap
       1      66      160        1      active sync   /dev/sdaq
       2      66      176        2      active sync   /dev/sdar
       3      66      192        3      active sync   /dev/sdas
       4      66      208        4      active sync   /dev/sdat
       5      66      224        5      active sync   /dev/sdau
       6      66      240        6      active sync   /dev/sdav
       7      67        0        7      active sync   /dev/sdaw
       8       8       16        8      active sync   /dev/sdb
       9       8      112        9      active sync   /dev/sdh

root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 21.0579 seconds, 498 MB/s

root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=1024k count=99999
99999+0 records in
99999+0 records out
104856551424 bytes (105 GB) copied, 221.137 seconds, 474 MB/s

Much better :D

So basically linux + mpt fusion + multipathd + mdadm not so good option for OST!!!

-----Original Message-----
From: Hung-Sheng.Tsao at Sun.COM [mailto:Hung-Sheng.Tsao at Sun.COM] 
Sent: Thursday, September 10, 2009 6:25 PM
To: Rafael David Tinoco
Subject: Re: [Lustre-discuss] OST - low MB/s

so what is the out put if U use 128k*8=bs?


Rafael David Tinoco wrote:
> My journal device is:
>
> root at a01n00:~# mdadm --detail /dev/md10
> /dev/md10:
>         Version : 00.90.03
>   Creation Time : Thu Sep 10 17:49:07 2009
>      Raid Level : raid1
>      Array Size : 987840 (964.85 MiB 1011.55 MB)
>   Used Dev Size : 987840 (964.85 MiB 1011.55 MB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 10
>     Persistence : Superblock is persistent
>
>     Update Time : Thu Sep 10 17:49:07 2009
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>            UUID : e48152dd:adb1c505:137aa99c:1b3eece4
>          Events : 0.1
>
>     Number   Major   Minor   RaidDevice State
>        0     253       17        0      active sync   /dev/dm-17
>        1     253       14        1      active sync   /dev/dm-14
>
> My OST device is:
>
> root at a01n00:~# mdadm --detail /dev/md20
> /dev/md20:
>         Version : 00.90.03
>   Creation Time : Thu Sep 10 17:49:23 2009
>      Raid Level : raid6
>      Array Size : 7814099968 (7452.11 GiB 8001.64 GB)
>   Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 20
>     Persistence : Superblock is persistent
>
>     Update Time : Thu Sep 10 18:06:20 2009
>           State : clean
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
>
>      Chunk Size : 128K
>
>            UUID : b80fb16d:38c47a56:fdf2b5e9:9ff47af3
>          Events : 0.2
>
>     Number   Major   Minor   RaidDevice State
>        0     253       11        0      active sync   /dev/dm-11
>        1     253       12        1      active sync   /dev/dm-12
>        2     253       13        2      active sync   /dev/dm-13
>        3     253       15        3      active sync   /dev/dm-15
>        4     253       16        4      active sync   /dev/dm-16
>        5     253       18        5      active sync   /dev/dm-18
>        6     253       19        6      active sync   /dev/dm-19
>        7     253       20        7      active sync   /dev/dm-20
>        8     253        1        8      active sync   /dev/dm-1
>        9     253       21        9      active sync   /dev/dm-21
>
> -----Original Message-----
> From: Hung-Sheng.Tsao at Sun.COM [mailto:Hung-Sheng.Tsao at Sun.COM] 
> Sent: Thursday, September 10, 2009 6:19 PM
> To: Rafael David Tinoco
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] OST - low MB/s
>
> not sure I understand Ur setup
> which one is the raid6 lun?
> which are the individual HD?
>
>
> Rafael David Tinoco wrote:
>   
>> 216MB/s using 8*128 (1024k) as bs. Too low for 8 active disks.. right ? Arround 27MB/s.. from 50MB/s in the "real" disk.
>>
>> -----Original Message-----
>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Dr. Hung-Sheng Tsao
>> (LaoTsao)
>> Sent: Thursday, September 10, 2009 5:50 PM
>> To: Rafael David Tinoco
>> Cc: lustre-discuss at lists.lustre.org
>> Subject: Re: [Lustre-discuss] OST - low MB/s
>>
>> raid 6 chunk size=128k the full strip size will be 128k*8 (for 10 disks 
>> 8+2 raid 6)
>> in Ur dd test one should use bs=128k*8, then each 8 HDD will be busy
>> regards
>>
>>
>> Rafael David Tinoco wrote:
>>   
>>     
>>> With this RAID5 configuration Im getting:
>>>
>>> root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=128k count=10000
>>>
>>> 10000+0 records in
>>>
>>> 10000+0 records out
>>>
>>> 1310720000 bytes (1.3 GB) copied, 5.20774 seconds, 252 MB/s
>>>
>>> root at a02n00:~# dd if=/dev/zero of=/dev/md20 bs=128k count=10000
>>>
>>> 10000+0 records in
>>>
>>> 10000+0 records out
>>>
>>> 1310720000 bytes (1.3 GB) copied, 5.12 seconds, 256 MB/s
>>>
>>> So, 80MB/s using these md20 as OSTs isnt quite right .
>>>
>>> *From:* lustre-discuss-bounces at lists.lustre.org 
>>> [mailto:lustre-discuss-bounces at lists.lustre.org] *On Behalf Of *Rafael 
>>> David Tinoco
>>> *Sent:* Thursday, September 10, 2009 4:26 PM
>>> *To:* lustre-discuss at lists.lustre.org
>>> *Subject:* [Lustre-discuss] OST - low MB/s
>>>
>>> Hello,
>>>
>>> I'm having problems now with my "OSTs" throughput.
>>>
>>> I have 4 OSS each one with 2 OSTs. These OSTs are RAID6 with 10 disks, 
>>> chunk size of 128k.
>>>
>>> These disks are from J4400 (JBOD) connected in multipath using multipathd.
>>>
>>> Each disk speed is giving me 50MB/s with dd.
>>>
>>> With lustre, using IOR or DD I can get only arround 80MB/s. I was 
>>> expecting for 8 active disks in raid 8*50 = something between 300 and 
>>> 400MB/s.
>>>
>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>
>>> 0.00 0.00 6.00 9.06 0.00 84.94
>>>
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await 
>>> svctm %util
>>>
>>> md10 0.00 0.00 0.00 398.00 0.00 1.55 8.00 0.00 0.00 0.00 0.00
>>>
>>> md11 0.00 0.00 0.00 380.00 0.00 1.48 8.00 0.00 0.00 0.00 0.00
>>>
>>> md20 0.00 0.00 0.00 158.00 0.00 79.00 1024.00 0.00 0.00 0.00 0.00
>>>
>>> md21 0.00 0.00 0.00 159.00 0.00 79.50 1024.00 0.00 0.00 0.00 0.00
>>>
>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>
>>> 0.00 0.00 5.94 9.32 0.00 84.74
>>>
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await 
>>> svctm %util
>>>
>>> md10 0.00 0.00 0.00 407.50 0.00 1.59 8.00 0.00 0.00 0.00 0.00
>>>
>>> md11 0.00 0.00 0.00 394.00 0.00 1.54 8.00 0.00 0.00 0.00 0.00
>>>
>>> md20 0.00 0.00 0.00 159.00 0.00 79.50 1024.00 0.00 0.00 0.00 0.00
>>>
>>> md21 0.00 0.00 0.00 158.00 0.00 79.00 1024.00 0.00 0.00 0.00 0.00
>>>
>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>
>>> 0.00 0.00 6.37 9.43 0.00 84.21
>>>
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await 
>>> svctm %util
>>>
>>> md10 0.00 0.00 0.00 410.50 0.00 1.60 8.00 0.00 0.00 0.00 0.00
>>>
>>> md11 0.00 0.00 0.00 376.00 0.00 1.47 8.00 0.00 0.00 0.00 0.00
>>>
>>> md20 0.00 0.00 0.00 165.00 0.00 82.50 1024.00 0.00 0.00 0.00 0.00
>>>
>>> md21 0.00 0.00 0.00 165.00 0.00 82.50 1024.00 0.00 0.00 0.00 0.00
>>>
>>> Any clues ?
>>>
>>> Rafael David Tinoco - Sun Microsystems
>>>
>>> Systems Engineer - High Performance Computing
>>>
>>> Rafael.Tinoco at Sun.COM - 55.11.5187.2194
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>   
>>>     
>>>       
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>   
>>     
>
>   




More information about the lustre-discuss mailing list