[Lustre-discuss] disk fragmented I/Os
Kevin Van Maren
kevin.van.maren at oracle.com
Wed Mar 31 04:55:42 PDT 2010
Lu Wang wrote:
> Dear list,
> We have got a brw_stat result for one OST since it was up. According to this statistic, 50% percent disk I/Os are fragmented. I find a earlier discussion in this list referred to this qestion:
> http://lists.lustre.org/pipermail/lustre-discuss/2009-August/011433.html
> It seems it is ideal to have 100% disk I/Os with fragment "1" or "0". I don't know why the I/Os are fragmented, since I found the max_sectors_kb is big enough(16MB?)for biggest disk I/O size( according to brw_stat, it is 1MB)
>
> # cat /sys/block/sda/queue/max_sectors_kb
> 32767
> # cat /sys/block/sda/queue/max_hw_sectors_kb
> 32767
>
So the drive is limited to 32MB IOs. Below it is clear that you are
seeing fragmentation, so the question becomes why are the IOs being
broken up? It is very unlikely Lustre is breaking up the IO willingly,
so most likely something in the IO stack is restricting the IO sizes.
What are you using for an OST, and what controller/driver/driver version?
What version of Lustre, and what version of Linux are you using on the
OSS node?
> read | write
> pages per bulk r/w rpcs % cum % | rpcs % cum %
> 128: 4976083 20 45 | 198864 4 13
> 256: 13457144 54 100 | 3522333 86 100
>
So the clients are doing 1MB RPCs to the server (which is good).
> read | write
> disk fragmented I/Os ios % cum % | ios % cum %
> 0: 9821 0 0 | 0 0 0
> 1: 11933478 48 48 | 630964 15 15
> 2: 12726392 51 99 | 3350479 82 97
> 3: 155476 0 99 | 84465 2 99
>
But all your IOs are being broken in half.
> read | write
> disk I/Os in flight ios % cum % | ios % cum %
> 1: 10954265 28 28 | 3781021 49 49
> 2: 9217023 24 53 | 3329128 43 93
> 3: 6063548 15 69 | 272981 3 97
>
This is really bad -- it seems that it only ever issues one write at a
time to the disks.
Lustre would normally issue up to 31, so there may be something about
your disk or driver
preventing multiple outstanding IOs.
> read | write
> disk I/O size ios % cum % | ios % cum %
> 256K: 4464264 11 29 | 288737 3 11
> 512K: 24846133 65 94 | 5997373 78 90
> 1M: 1951161 5 100 | 747214 9 100
>
Basically this is saying that nearly all 1MB IOs are being broken into
512KB pieces.
> I have 2 questions:
> 1. Could any one explain what dose these parameters exactly mean?
> /sys/block/sda/queue/max_sectors_kb /sys/block/sda/queue/max_hw_sectors_kb
How large an IO size can be sent (allowed) to the disk, and how large of
an IO the disk drive supports.
> ,disk fragmented I/Os, disk I/O size of brw_stats
>
How many pieces each ldiskfs write are broken into, and the size of the
pieces.
> 2. In which case, the disk I/O will be fragemented?
>
>
> Thanks a lot in advance!
>
> Best Regards
> Lu Wang
>
More information about the lustre-discuss
mailing list