[Lustre-discuss] Anyone using this or similar system

James Robnett jrobnett at aoc.nrao.edu
Tue Aug 10 04:49:36 PDT 2010


>  Hello List memebers,
> We are planning a Lustre setup for our lab. I was searching for cheap but
> reliable DAS and/or JBOD solutions for setting this up and came across
> this:
>
> http://www.supermicro.com/products/chassis/4U/846/SC846E26-R1200.cfm
>
> I would like to know if anyone has any experience in setting up this or
> similar kind of a system.

  I've only just started playing with 2 OSSes based on that chassis.
They have a:
Supermicro X8DTH (7 PCI-E 8x slots) motherboard
Dual quad core Intel Xeon processors
Four 8 port 3ware 9650 raid controllers (4 OSTs/OSS)
1 Mellanox MT26428 QDR Infiniband HCA
24 WD2003FYYS hard drives

  The disks are carved into four 4+1+spare R5 arrays for 8 total OST's.

  I hate to admit it but I skimped on the memory initially and only got
4GB.  We'll never have a read cache hit regardless so I'm not entirely
convinced it's starved at 4GB but the manual suggests that's low for 4
OST's and 2 processors (though empirically I'm not convinced).  If you can
throw enough memory at it to get disk cache hits by all means do.  I'll
throw more memory at it at the first hint it's beneficial.

    I'm currently client bound doing total Lustre throughput tests.  For
reasons I don't fully understand (and have been pondering posting about)
I simply can't get more than about 700MB/s reads on a client.  They seem
to be bound re-assembling the replies which I expected, I just assumed
the peak would be higher.

   With 3 clients I get 2.1GB/s reads across the 8 OST's with IOzone: 8
threads per 3 discrete IOzones, doing 1MB IO's so each OST is seeing 3
concurrent reads which more or less mimics our software.  Eventually
I expect it to peak at around 3GB/s aggregate (1.5GB/s per OSS).

   Our data reduction software/cluster size lends itself to this type of
config where we have many times the number of OST's of multi-GB files
mapped to individual cluster nodes for processing.  So no striping; each
file on an OST.  The disks are relatively reliable.  I don't plan to scale
it beyond 6-8 OSS's so reliability is still manageable.

   We use the same chassis/disks but different raid/network link for bulk
storage.  We have 22 in all (~500 disks).

James Robnett
NRAO/AOC









More information about the lustre-discuss mailing list