[Lustre-discuss] Re: Re: disk fragmented I/Os

Lu Wang wanglu at ihep.ac.cn
Wed Mar 31 19:55:24 PDT 2010


We are using  lustre 1.8.1.1 on 2.6.18-128.7.1.el5. The disk controller is NetStor_iSUM510, driver is  qla2xxx (8.02.00.06.05.03-k).   
	
We made 2 partition for each disk volume:
	/dev/sda1         4325574520 3911425648 194422312  96% /lustre/ost1
/dev/sda2            4324980788 3898888124 206396204  95% /lustre/ost2
/dev/sdb1            4325574520 3909042320 196805640  96% /lustre/ost3
/dev/sdb2            4324980788 3920306524 184977804  96% /lustre/ost4
/dev/sdc1            4325574520 3868328108 237519852  95% /lustre/ost5
/dev/sdc2            4324980788 3921774384 183509944  96% /lustre/ost6
/dev/sdd1            4325574520 3911662272 194185688  96% /lustre/ost7
/dev/sdd2            4324980788 3884415428 220868900  95% /lustre/ost8
	
Because Lustre can not support OST larger than 8TB. 

	
	

------------------				 
Lu Wang
2010-04-01

-------------------------------------------------------------
发件人:Kevin Van Maren
发送日期:2010-03-31 19:56:59
收件人:Lu Wang
抄送:lustre-discuss
主题:Re: [Lustre-discuss] disk fragmented I/Os

Lu Wang wrote:
> Dear list,
> 	 We have got a brw_stat result for one OST since it was up. According to this statistic, 50% percent disk I/Os are fragmented. I find a earlier discussion in this list referred to this qestion:
> http://lists.lustre.org/pipermail/lustre-discuss/2009-August/011433.html
> It seems it is ideal to have 100% disk I/Os with fragment "1" or "0".  I don't know why the I/Os are fragmented, since I found the max_sectors_kb is big enough(16MB?)for biggest disk I/O size( according to brw_stat, it is 1MB)
>
> # cat /sys/block/sda/queue/max_sectors_kb 
> 32767
> # cat /sys/block/sda/queue/max_hw_sectors_kb 
> 32767
>   

So the drive is limited to 32MB IOs.  Below it is clear that you are 
seeing fragmentation, so the question becomes why are the IOs being 
broken up?  It is very unlikely Lustre is breaking up the IO willingly, 
so most likely something in the IO stack is restricting the IO sizes.

What are you using for an OST, and what controller/driver/driver version?

What version of Lustre, and what version of Linux are you using on the 
OSS node?

>                            read      |     write
> pages per bulk r/w     rpcs  % cum % |  rpcs  % cum %
> 128:               4976083  20  45   | 198864   4  13
> 256:              13457144  54 100   | 3522333  86 100
>   
So the clients are doing 1MB RPCs to the server (which is good).

>                            read      |     write
> disk fragmented I/Os   ios   % cum % |  ios   % cum %
> 0:                    9821   0   0   |    0   0   0
> 1:                11933478  48  48   | 630964  15  15
> 2:                12726392  51  99   | 3350479  82  97
> 3:                  155476   0  99   | 84465   2  99
>   
But all your IOs are being broken in half.

>                            read      |     write
> disk I/Os in flight    ios   % cum % |  ios   % cum %
> 1:                10954265  28  28   | 3781021  49  49
> 2:                 9217023  24  53   | 3329128  43  93
> 3:                 6063548  15  69   | 272981   3  97
>   

This is really bad -- it seems that it only ever issues one write at a 
time to the disks.
Lustre would normally issue up to 31, so there may be something about 
your disk or driver
preventing multiple outstanding IOs.

>                            read      |     write
> disk I/O size          ios   % cum % |  ios   % cum %
> 256K:              4464264  11  29   | 288737   3  11
> 512K:             24846133  65  94   | 5997373  78  90
> 1M:                1951161   5 100   | 747214   9 100
>   

Basically this is saying that nearly all 1MB IOs are being broken into 
512KB pieces.

> I have 2 questions: 
> 1. Could any one explain what dose these parameters exactly mean?
>   /sys/block/sda/queue/max_sectors_kb  /sys/block/sda/queue/max_hw_sectors_kb
How large an IO size can be sent (allowed) to the disk, and how large of 
an IO the disk drive supports.

> ,disk fragmented I/Os,  disk I/O size  of brw_stats
>   
How many pieces each ldiskfs write are broken into, and the size of the 
pieces.

> 2. In which case, the disk I/O will be fragemented?
>
>
> Thanks a lot in advance!
>
> Best Regards
> Lu Wang
> 	


More information about the lustre-discuss mailing list