[Lustre-discuss] disk fragmented I/Os

Kevin Van Maren kevin.van.maren at oracle.com
Wed Mar 31 04:55:42 PDT 2010


Lu Wang wrote:
> Dear list,
> 	 We have got a brw_stat result for one OST since it was up. According to this statistic, 50% percent disk I/Os are fragmented. I find a earlier discussion in this list referred to this qestion:
> http://lists.lustre.org/pipermail/lustre-discuss/2009-August/011433.html
> It seems it is ideal to have 100% disk I/Os with fragment "1" or "0".  I don't know why the I/Os are fragmented, since I found the max_sectors_kb is big enough(16MB?)for biggest disk I/O size( according to brw_stat, it is 1MB)
>
> # cat /sys/block/sda/queue/max_sectors_kb 
> 32767
> # cat /sys/block/sda/queue/max_hw_sectors_kb 
> 32767
>   

So the drive is limited to 32MB IOs.  Below it is clear that you are 
seeing fragmentation, so the question becomes why are the IOs being 
broken up?  It is very unlikely Lustre is breaking up the IO willingly, 
so most likely something in the IO stack is restricting the IO sizes.

What are you using for an OST, and what controller/driver/driver version?

What version of Lustre, and what version of Linux are you using on the 
OSS node?

>                            read      |     write
> pages per bulk r/w     rpcs  % cum % |  rpcs  % cum %
> 128:               4976083  20  45   | 198864   4  13
> 256:              13457144  54 100   | 3522333  86 100
>   
So the clients are doing 1MB RPCs to the server (which is good).

>                            read      |     write
> disk fragmented I/Os   ios   % cum % |  ios   % cum %
> 0:                    9821   0   0   |    0   0   0
> 1:                11933478  48  48   | 630964  15  15
> 2:                12726392  51  99   | 3350479  82  97
> 3:                  155476   0  99   | 84465   2  99
>   
But all your IOs are being broken in half.

>                            read      |     write
> disk I/Os in flight    ios   % cum % |  ios   % cum %
> 1:                10954265  28  28   | 3781021  49  49
> 2:                 9217023  24  53   | 3329128  43  93
> 3:                 6063548  15  69   | 272981   3  97
>   

This is really bad -- it seems that it only ever issues one write at a 
time to the disks.
Lustre would normally issue up to 31, so there may be something about 
your disk or driver
preventing multiple outstanding IOs.

>                            read      |     write
> disk I/O size          ios   % cum % |  ios   % cum %
> 256K:              4464264  11  29   | 288737   3  11
> 512K:             24846133  65  94   | 5997373  78  90
> 1M:                1951161   5 100   | 747214   9 100
>   

Basically this is saying that nearly all 1MB IOs are being broken into 
512KB pieces.

> I have 2 questions: 
> 1. Could any one explain what dose these parameters exactly mean?
>   /sys/block/sda/queue/max_sectors_kb  /sys/block/sda/queue/max_hw_sectors_kb
How large an IO size can be sent (allowed) to the disk, and how large of 
an IO the disk drive supports.

> ,disk fragmented I/Os,  disk I/O size  of brw_stats
>   
How many pieces each ldiskfs write are broken into, and the size of the 
pieces.

> 2. In which case, the disk I/O will be fragemented?
>
>
> Thanks a lot in advance!
>
> Best Regards
> Lu Wang
> 	




More information about the lustre-discuss mailing list