Hi Kevin,<br><br>In my kernel .config I find following lines<br><br>CONFIG_SCSI_MPT2SAS=m<br>CONFIG_SCSI_MPT2SAS_MAX_SGE=128<br>CONFIG_SCSI_MPT2SAS_LOGGING=y<br><br>I changed SGE value to 256<br><br>Do I need to recompile the Kernel before building new module based on that .config?<br>
<br><br><br><div class="gmail_quote">On 10 June 2011 13:00, Wojciech Turek <span dir="ltr"><<a href="mailto:wjt27@cam.ac.uk">wjt27@cam.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi Kevin,<br><br>Thanks for very helpful answer. I tried your suggestion and recompiled the mpt2sas driver with the following changes:<br><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">--- mpt2sas_base.h 2010-01-16 20:57:30.000000000 +0000</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">+++ new_mpt2sas_base.h 2011-06-10 12:53:35.000000000 +0100</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">@@ -83,13 +83,13 @@</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> #ifdef CONFIG_SCSI_MPT2SAS_MAX_SGE</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> #if CONFIG_SCSI_MPT2SAS_MAX_SGE < 16</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> #define MPT2SAS_SG_DEPTH 16</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">-#elif CONFIG_SCSI_MPT2SAS_MAX_SGE > 128</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">-#define MPT2SAS_SG_DEPTH 128</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">+#elif CONFIG_SCSI_MPT2SAS_MAX_SGE > 256</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">+#define MPT2SAS_SG_DEPTH 256</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> #else</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> #define MPT2SAS_SG_DEPTH CONFIG_SCSI_MPT2SAS_MAX_SGE</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> #endif</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> #else</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">-#define MPT2SAS_SG_DEPTH 128 /* MAX_HW_SEGMENTS */</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">+#define MPT2SAS_SG_DEPTH 256 /* MAX_HW_SEGMENTS */</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> #endif</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> #if defined(TARGET_MODE)</span><br style="font-family: courier new,monospace;"><br>
However I can still that almost 50% of writes and slightly over 50% of reads falls under 512K I/Os<br>I
am using device-mapper-multipath to manage active/passive paths do you
think that could have something to do with the I/O fragmentation?<br>
<br>
Best regards,<br><font color="#888888">
<br>
Wojciech <br></font><div><div></div><div class="h5"><br><div class="gmail_quote">On 8 June 2011 17:30, Kevin Van Maren <span dir="ltr"><<a href="mailto:kevin.van.maren@oracle.com" target="_blank">kevin.van.maren@oracle.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Yep, with 1.8.5 the problem is most likely in the (mpt2sas) driver, not in the rest of the kernel. Driver limits are not normally noticed by (non-Lustre) people, because the default kernel limits IO to 512KB.<br>
<br>
May want to see Bug 22850 for the changes required eg, for the Emulex/lpfc driver.<br>
<br>
Glancing at the stock RHEL5 kernel, it looks like the issue is MPT2SAS_SG_DEPTH, which is limited to 128. This appears to be set to match the default kernel limit, but it is possible there is also a driver/HW limit. You should be able to increase that to 256 and see if it works...<br>
<br>
<br>
Also note that the size buckets are power-of-2, so a "1MB" entry is any IO > 512KB and <= 1MB.<br>
<br>
If you can't get the driver to reliably do full 1MB IOs, change to a 64KB chunk and set max_sectors_kb to 512. This will help ensure you get aligned, full-stripe writes.<br><font color="#888888">
<br>
Kevin</font><div><div></div><div><br>
<br>
<br>
Wojciech Turek wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
I am setting up a new lustre filesystem using LSI engenio based disk<br>
enclosures with integrated dual RAID controllers. I configured disks<br>
into 8+2 RAID6 groups using 128kb segment size (chunk size). This<br>
hardware uses mpt2sas kernel module on the Linux host side. I use the<br>
whole block device for an OST (to avoid any alignment issues). When<br>
running sgpdd-survey I can see high through numbers (~3GB/s write,<br>
5GB/s read), Also controllers stats show that number of IOPS = number<br>
of MB/s. However as soon as I put ldiskfs on the OSTs, obdfilter shows<br>
slower results (~2GB/s write , ~2GB/s read ) and controller stats show<br>
more then double IOPS than MB/s. Looking at output from iostat -m -x 1<br>
and brw_stats I can see that a large number of I/O operations are<br>
smaller than 1MB, mostly 512kb. I know that there was some work done<br>
on optimising the kernel block device layer to process 1MB I/O<br>
requests and that those changes were committed to Lustre 1.8.5. Thus I<br>
guess this I/O chopping happens below the Lustre stack, maybe in the<br>
mpt2sas driver?<br>
<br>
I am hoping that someone in Lustre community can shed some light on to<br>
my problem.<br>
<br>
In my setup I use:<br>
Lustre 1.8.5<br>
CentOS-5.5<br>
<br>
Some parameters I tuned from defaults in CentOS:<br>
deadline I/O scheduler<br>
<br>
max_hw_sectors_kb=4096<br>
max_sectors_kb=1024<br>
<br>
<br>
brw_stats output<br>
--<br>
<br>
find /proc/fs/lustre/obdfilter/ -name "testfs-OST*" | while read ost;<br>
do cat $ost/brw_stats ; done | grep "disk I/O size" -A9<br>
<br>
disk I/O size ios % cum % | ios % cum %<br>
4K: 206 0 0 | 521 0 0<br>
8K: 224 0 0 | 595 0 1<br>
16K: 105 0 1 | 479 0 1<br>
32K: 140 0 1 | 1108 1 3<br>
64K: 231 0 1 | 1470 1 4<br>
128K: 536 1 2 | 2259 2 7<br>
256K: 1762 3 6 | 5644 6 14<br>
512K: 31574 64 71 | 30431 35 50<br>
1M: 14200 28 100 | 42143 49 100<br>
--<br>
disk I/O size ios % cum % | ios % cum %<br>
4K: 187 0 0 | 457 0 0<br>
8K: 244 0 0 | 598 0 1<br>
16K: 109 0 1 | 481 0 1<br>
32K: 129 0 1 | 1100 1 3<br>
64K: 222 0 1 | 1408 1 4<br>
128K: 514 1 2 | 2291 2 7<br>
256K: 1718 3 6 | 5652 6 14<br>
512K: 32222 65 72 | 29810 35 49<br>
1M: 13654 27 100 | 42202 50 100<br>
--<br>
disk I/O size ios % cum % | ios % cum %<br>
4K: 196 0 0 | 551 0 0<br>
8K: 206 0 0 | 551 0 1<br>
16K: 79 0 0 | 513 0 1<br>
32K: 136 0 1 | 1048 1 3<br>
64K: 232 0 1 | 1278 1 4<br>
128K: 540 1 2 | 2172 2 7<br>
256K: 1681 3 6 | 5679 6 13<br>
512K: 31842 64 71 | 31705 37 51<br>
1M: 14077 28 100 | 41789 48 100<br>
--<br>
disk I/O size ios % cum % | ios % cum %<br>
4K: 190 0 0 | 486 0 0<br>
8K: 200 0 0 | 547 0 1<br>
16K: 93 0 0 | 448 0 1<br>
32K: 141 0 1 | 1029 1 3<br>
64K: 240 0 1 | 1283 1 4<br>
128K: 558 1 2 | 2125 2 7<br>
256K: 1716 3 6 | 5400 6 13<br>
512K: 31476 64 70 | 29029 35 48<br>
1M: 14366 29 100 | 42454 51 100<br>
--<br>
disk I/O size ios % cum % | ios % cum %<br>
4K: 209 0 0 | 511 0 0<br>
8K: 195 0 0 | 621 0 1<br>
16K: 79 0 0 | 558 0 1<br>
32K: 134 0 1 | 1135 1 3<br>
64K: 245 0 1 | 1390 1 4<br>
128K: 509 1 2 | 2219 2 7<br>
256K: 1715 3 6 | 5687 6 14<br>
512K: 31784 64 71 | 31172 36 50<br>
1M: 14112 28 100 | 41719 49 100<br>
--<br>
disk I/O size ios % cum % | ios % cum %<br>
4K: 201 0 0 | 500 0 0<br>
8K: 241 0 0 | 604 0 1<br>
16K: 82 0 1 | 584 0 1<br>
32K: 130 0 1 | 1092 1 3<br>
64K: 230 0 1 | 1331 1 4<br>
128K: 547 1 2 | 2253 2 7<br>
256K: 1695 3 6 | 5634 6 14<br>
512K: 31501 64 70 | 31836 37 51<br>
1M: 14343 29 100 | 41517 48 100<br>
<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br><br>
</div></div></blockquote></div><br><br>