[Lustre-discuss] hacking max_sectors

Robin Humble robin.humble+lustre at anu.edu.au
Tue Aug 25 21:46:03 PDT 2009


Hiya,

I've had another go at fixing the problem I was seeing a few months ago:
  http://lists.lustre.org/pipermail/lustre-discuss/2009-April/010315.html
and which we are seeing again now as we are setting up a new machine
with 128k chunk software raid (md) RAID6 8+2 eg.
  Lustre: test-OST000d: underlying device md5 should be tuned for larger I/O requests: max_sectors = 1024 could be up to max_hw_sectors=2560 

I came up with the attached simple core kernel change which fixes the
problem, and seems stable enough under initial stress testing, but a
core scsi tweak seems a little drastic to me - is there a better way to
do it?

without this patch, and despite raising all disks to a ridiculously
huge max_sectors_kb, all Lustre 1M rpc's are still fragmented into two
512k chunks before being sent to md :-/ likely md then aggregates them
again 'cos performance isn't totaly dismal, which it would be if it was
100% read-modify-writes for each stripe write.

with the patch, 1M i/o's are being fed to md (according to brw_stats),
and performance is a little better for RAID6 8+2 with 128k chunks, and
a bit worse for RAID6 8+2 with 64k chunks (which are curiously now fed
half 512k and half 1M i/o's by Lustre).

the one-liner is a core kernel change, so perhaps some Lustre/kernel
block device/md people can look at it and see if it's acceptable for
inclusion in standard Lustre OSS kernels, or whether it breaks
assumptions in the core scsi layer somehow.

IMHO the best solution would be to apply the patch, and then have a
/sys/block/md*/queue/ for md devices so that max_sectors_kb and
max_hw_sectors_kb can be tuned without recompiling the kernel...
is that possible?

the patch is against 2.6.18-128.1.14.el5-lustre1.8.1

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility
-------------- next part --------------
--- linux-2.6.18.x86_64.lustre/include/linux/blkdev.h	2009-08-18 17:40:51.000000000 +1000
+++ linux-2.6.18.x86_64.lustre.hackBlock/include/linux/blkdev.h	2009-08-21 13:47:55.000000000 +1000
@@ -778,7 +778,7 @@
 #define MAX_PHYS_SEGMENTS 128
 #define MAX_HW_SEGMENTS 128
 #define SAFE_MAX_SECTORS 255
-#define BLK_DEF_MAX_SECTORS 1024
+#define BLK_DEF_MAX_SECTORS 2048
 
 #define MAX_SEGMENT_SIZE	65536
 


More information about the lustre-discuss mailing list