[Lustre-discuss] Fragmented I/O

Kevin Hildebrand kevin at umd.edu
Thu May 12 04:11:29 PDT 2011


The PERC 6 and H800 use megaraid_sas, I'm currently running 
00.00.04.17-RH1.

The max_sectors numbers (320) are what is being set by default- I am able 
to set it to something smaller than 320, but not larger.

Kevin

On Wed, 11 May 2011, Kevin Van Maren wrote:

> You didn't say, but I think they are LSI-based: are you using the mptsas
> driver with the PERC cards?  Which driver version?
>
> First, max_sectors_kb should normally be set to a power of 2 number,
> like 256, over an odd size like 320.  This number should also match the
> native raid size of the device, to avoid read-modify-write cycles.  (See
> Bug 22886 on why not to make it > 1024 in general).
>
> See Bug 17086 for patches to increase the max_sectors_kb limitation for
> the mptsas driver to 1MB, or the true hardware maximum, rather than a
> driver limit; however, the hardware may still be limited to sizes < 1MB.
>
> Also, to clarify the sizes: the smallest bucket >= transfer_size is the
> one incremented, so a 320KB IO increments the 512KB bucket.  Since your
> HW says it can only do a 320KB IO, there will never be a 1MB IO.
>
> You may want to instrument your HBA driver to see what is going on (ie,
> why the max_hw_sectors_kb is < 1024).
>
> Kevin
>
>
> Kevin Hildebrand wrote:
>> Hi, I'm having some performance issues on my Lustre filesystem and it
>> looks to me like it's related to I/Os getting fragmented before being
>> written to disk, but I can't figure out why.  This system is RHEL5,
>> running Lustre 1.8.4.
>>
>> All of my OSTs look pretty much the same-
>>
>>                             read      |     write
>> pages per bulk r/w     rpcs  % cum % |  rpcs  % cum %
>> 1:                   88811  38  38   | 46375  17  17
>> 2:                    1497   0  38   | 7733   2  20
>> 4:                    1161   0  39   | 1840   0  21
>> 8:                    1168   0  39   | 7148   2  24
>> 16:                    922   0  40   | 3297   1  25
>> 32:                    979   0  40   | 7602   2  28
>> 64:                   1576   0  41   | 9046   3  31
>> 128:                  7063   3  44   | 16284   6  37
>> 256:                129282  55 100   | 162090  62 100
>>
>>
>>                             read      |     write
>> disk fragmented I/Os   ios   % cum % |  ios   % cum %
>> 0:                   51181  22  22   |    0   0   0
>> 1:                   45280  19  42   | 82206  31  31
>> 2:                   16615   7  49   | 29108  11  42
>> 3:                    3425   1  50   | 17392   6  49
>> 4:                  110445  48  98   | 129481  49  98
>> 5:                    1661   0  99   | 2702   1  99
>>
>>                             read      |     write
>> disk I/O size          ios   % cum % |  ios   % cum %
>> 4K:                  45889   8   8   | 56240   7   7
>> 8K:                   3658   0   8   | 6416   0   8
>> 16K:                  7956   1  10   | 4703   0   9
>> 32K:                  4527   0  11   | 11951   1  10
>> 64K:                114369  20  31   | 134128  18  29
>> 128K:                 5095   0  32   | 17229   2  31
>> 256K:                 7164   1  33   | 30826   4  35
>> 512K:               369512  66 100   | 465719  64 100
>>
>> Oddly, there's no 1024K row in the I/O size table...
>>
>>
>> ...and these seem small to me as well, but I can't seem to change them.
>> Writing new values to either doesn't change anything.
>>
>> # cat /sys/block/sdb/queue/max_hw_sectors_kb
>> 320
>> # cat /sys/block/sdb/queue/max_sectors_kb
>> 320
>>
>> Hardware in question is DELL PERC 6/E and DELL PERC H800 RAID
>> controllers, with MD1000 and MD1200 arrays, respectively.
>>
>>
>> Any clues on where I should look next?
>>
>> Thanks,
>>
>> Kevin
>>
>> Kevin Hildebrand
>> University of Maryland, College Park
>> Office of Information Technology
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>



More information about the lustre-discuss mailing list