[Lustre-discuss] HW RAID - fragmented I/O
Wojciech Turek
wjt27 at cam.ac.uk
Wed Jun 8 08:53:31 PDT 2011
I am setting up a new lustre filesystem using LSI engenio based disk
enclosures with integrated dual RAID controllers. I configured disks
into 8+2 RAID6 groups using 128kb segment size (chunk size). This
hardware uses mpt2sas kernel module on the Linux host side. I use the
whole block device for an OST (to avoid any alignment issues). When
running sgpdd-survey I can see high through numbers (~3GB/s write,
5GB/s read), Also controllers stats show that number of IOPS = number
of MB/s. However as soon as I put ldiskfs on the OSTs, obdfilter shows
slower results (~2GB/s write , ~2GB/s read ) and controller stats show
more then double IOPS than MB/s. Looking at output from iostat -m -x 1
and brw_stats I can see that a large number of I/O operations are
smaller than 1MB, mostly 512kb. I know that there was some work done
on optimising the kernel block device layer to process 1MB I/O
requests and that those changes were committed to Lustre 1.8.5. Thus I
guess this I/O chopping happens below the Lustre stack, maybe in the
mpt2sas driver?
I am hoping that someone in Lustre community can shed some light on to
my problem.
In my setup I use:
Lustre 1.8.5
CentOS-5.5
Some parameters I tuned from defaults in CentOS:
deadline I/O scheduler
max_hw_sectors_kb=4096
max_sectors_kb=1024
brw_stats output
--
find /proc/fs/lustre/obdfilter/ -name "testfs-OST*" | while read ost;
do cat $ost/brw_stats ; done | grep "disk I/O size" -A9
disk I/O size ios % cum % | ios % cum %
4K: 206 0 0 | 521 0 0
8K: 224 0 0 | 595 0 1
16K: 105 0 1 | 479 0 1
32K: 140 0 1 | 1108 1 3
64K: 231 0 1 | 1470 1 4
128K: 536 1 2 | 2259 2 7
256K: 1762 3 6 | 5644 6 14
512K: 31574 64 71 | 30431 35 50
1M: 14200 28 100 | 42143 49 100
--
disk I/O size ios % cum % | ios % cum %
4K: 187 0 0 | 457 0 0
8K: 244 0 0 | 598 0 1
16K: 109 0 1 | 481 0 1
32K: 129 0 1 | 1100 1 3
64K: 222 0 1 | 1408 1 4
128K: 514 1 2 | 2291 2 7
256K: 1718 3 6 | 5652 6 14
512K: 32222 65 72 | 29810 35 49
1M: 13654 27 100 | 42202 50 100
--
disk I/O size ios % cum % | ios % cum %
4K: 196 0 0 | 551 0 0
8K: 206 0 0 | 551 0 1
16K: 79 0 0 | 513 0 1
32K: 136 0 1 | 1048 1 3
64K: 232 0 1 | 1278 1 4
128K: 540 1 2 | 2172 2 7
256K: 1681 3 6 | 5679 6 13
512K: 31842 64 71 | 31705 37 51
1M: 14077 28 100 | 41789 48 100
--
disk I/O size ios % cum % | ios % cum %
4K: 190 0 0 | 486 0 0
8K: 200 0 0 | 547 0 1
16K: 93 0 0 | 448 0 1
32K: 141 0 1 | 1029 1 3
64K: 240 0 1 | 1283 1 4
128K: 558 1 2 | 2125 2 7
256K: 1716 3 6 | 5400 6 13
512K: 31476 64 70 | 29029 35 48
1M: 14366 29 100 | 42454 51 100
--
disk I/O size ios % cum % | ios % cum %
4K: 209 0 0 | 511 0 0
8K: 195 0 0 | 621 0 1
16K: 79 0 0 | 558 0 1
32K: 134 0 1 | 1135 1 3
64K: 245 0 1 | 1390 1 4
128K: 509 1 2 | 2219 2 7
256K: 1715 3 6 | 5687 6 14
512K: 31784 64 71 | 31172 36 50
1M: 14112 28 100 | 41719 49 100
--
disk I/O size ios % cum % | ios % cum %
4K: 201 0 0 | 500 0 0
8K: 241 0 0 | 604 0 1
16K: 82 0 1 | 584 0 1
32K: 130 0 1 | 1092 1 3
64K: 230 0 1 | 1331 1 4
128K: 547 1 2 | 2253 2 7
256K: 1695 3 6 | 5634 6 14
512K: 31501 64 70 | 31836 37 51
1M: 14343 29 100 | 41517 48 100
--
Wojciech Turek
More information about the lustre-discuss
mailing list