[Lustre-discuss] Recipe for 1MB iosize

Richard Smith Richard.Smith at Sun.COM
Mon Aug 10 20:23:06 PDT 2009


I would like to be able to increase the average size of i/o requests I'm
making to the individual disks in my underlying JBOD disk array, but I
seem to be running into a limit somewhere that stops it at around 205KB.

In order to remove as many extraneous factors as I could, I ran some tests
against a single physical disk, opened with O_DIRECT using 1MB random i/o.
[The real workload I'm interested in consists of many streams of sequential
i/o to a large number of files quasi-concurrently.]

When I increased max_sectors_kb from 512 to 4096, this increased the average
request size from 341 sectors to 409 sectors, but its still a long way
off 1MB.

rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  
svctm  %util
  0.00     0.00 271.00  0.00 111001.60     0.00   409.60     4.01   
14.73   3.67  99.54
  0.00     0.00 266.00  0.00 108953.60     0.00   409.60     4.01   
15.01   3.75  99.62
  0.00     0.00 269.46  0.00 110371.26     0.00   409.60     4.01   
14.86   3.69  99.46

On a different box I had compiled MPT SAS Fusion 4.00.16 driver, and noticed
in the source that I might be limited by SCSI_MAX_PHYS_SEGMENTS=128, which
worst-case means only 128 x 4KB per i/o. There could however be
something else again I've overlooked.

Is there such a thing as a recipe for 1MB i/o at the relatively low-level
block device layer? If I can achieve this, then I'm guessing I can configure
my metadevices with 1MB chunk, and use mkfs.lustre --mkfsoptions="-E 
stride=..."
to encourage the use of 1MB i/o higher up the stack.


-- 
============================================================================
    ,-_|\   Richard Smith Staff Engineer PAE
   /     \  Sun Microsystems                   Phone : +61 3 9869 6200
richard.smith at Sun.COM                         Direct : +61 3 9869 6224
   \_,-._/  476 St Kilda Road                    Fax : +61 3 9869 6290
        v   Melbourne Vic 3004 Australia
===========================================================================




More information about the lustre-discuss mailing list