[Lustre-discuss] Fragmented I/O
Kevin Hildebrand
kevin at umd.edu
Thu May 12 04:11:29 PDT 2011
The PERC 6 and H800 use megaraid_sas, I'm currently running
00.00.04.17-RH1.
The max_sectors numbers (320) are what is being set by default- I am able
to set it to something smaller than 320, but not larger.
Kevin
On Wed, 11 May 2011, Kevin Van Maren wrote:
> You didn't say, but I think they are LSI-based: are you using the mptsas
> driver with the PERC cards? Which driver version?
>
> First, max_sectors_kb should normally be set to a power of 2 number,
> like 256, over an odd size like 320. This number should also match the
> native raid size of the device, to avoid read-modify-write cycles. (See
> Bug 22886 on why not to make it > 1024 in general).
>
> See Bug 17086 for patches to increase the max_sectors_kb limitation for
> the mptsas driver to 1MB, or the true hardware maximum, rather than a
> driver limit; however, the hardware may still be limited to sizes < 1MB.
>
> Also, to clarify the sizes: the smallest bucket >= transfer_size is the
> one incremented, so a 320KB IO increments the 512KB bucket. Since your
> HW says it can only do a 320KB IO, there will never be a 1MB IO.
>
> You may want to instrument your HBA driver to see what is going on (ie,
> why the max_hw_sectors_kb is < 1024).
>
> Kevin
>
>
> Kevin Hildebrand wrote:
>> Hi, I'm having some performance issues on my Lustre filesystem and it
>> looks to me like it's related to I/Os getting fragmented before being
>> written to disk, but I can't figure out why. This system is RHEL5,
>> running Lustre 1.8.4.
>>
>> All of my OSTs look pretty much the same-
>>
>> read | write
>> pages per bulk r/w rpcs % cum % | rpcs % cum %
>> 1: 88811 38 38 | 46375 17 17
>> 2: 1497 0 38 | 7733 2 20
>> 4: 1161 0 39 | 1840 0 21
>> 8: 1168 0 39 | 7148 2 24
>> 16: 922 0 40 | 3297 1 25
>> 32: 979 0 40 | 7602 2 28
>> 64: 1576 0 41 | 9046 3 31
>> 128: 7063 3 44 | 16284 6 37
>> 256: 129282 55 100 | 162090 62 100
>>
>>
>> read | write
>> disk fragmented I/Os ios % cum % | ios % cum %
>> 0: 51181 22 22 | 0 0 0
>> 1: 45280 19 42 | 82206 31 31
>> 2: 16615 7 49 | 29108 11 42
>> 3: 3425 1 50 | 17392 6 49
>> 4: 110445 48 98 | 129481 49 98
>> 5: 1661 0 99 | 2702 1 99
>>
>> read | write
>> disk I/O size ios % cum % | ios % cum %
>> 4K: 45889 8 8 | 56240 7 7
>> 8K: 3658 0 8 | 6416 0 8
>> 16K: 7956 1 10 | 4703 0 9
>> 32K: 4527 0 11 | 11951 1 10
>> 64K: 114369 20 31 | 134128 18 29
>> 128K: 5095 0 32 | 17229 2 31
>> 256K: 7164 1 33 | 30826 4 35
>> 512K: 369512 66 100 | 465719 64 100
>>
>> Oddly, there's no 1024K row in the I/O size table...
>>
>>
>> ...and these seem small to me as well, but I can't seem to change them.
>> Writing new values to either doesn't change anything.
>>
>> # cat /sys/block/sdb/queue/max_hw_sectors_kb
>> 320
>> # cat /sys/block/sdb/queue/max_sectors_kb
>> 320
>>
>> Hardware in question is DELL PERC 6/E and DELL PERC H800 RAID
>> controllers, with MD1000 and MD1200 arrays, respectively.
>>
>>
>> Any clues on where I should look next?
>>
>> Thanks,
>>
>> Kevin
>>
>> Kevin Hildebrand
>> University of Maryland, College Park
>> Office of Information Technology
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
More information about the lustre-discuss
mailing list