[Lustre-discuss] Max bandwidth through a single 4xQDR IB link?

Tue Jun 29 07:15:02 PDT 2010

Hello Ashley, hello Kevin,

I really see no point to use disks to benchmark performance, when 
lnet_selftest exists. Benchmark order should be:

- test how much the disks can provide
- test network with lnet_selftest

=> make sure lustre performance is not much below the 
   min(disks, lnet_selftest)

Cheers,
Bernd

On Tuesday, June 29, 2010, Kevin Van Maren wrote:
> DAPL is a high-performance interface that uses a small shim to provide a
> common DMA API on top of (in this case) the IB verbs layer.  In general,
> there is a very small performance impact to be able to use the common
> API, so you will not get more large-message bandwidth using native IB
> verbs.
> 
> I've never had enough disk bandwidth behind a node to saturate a QDR IB
> link, so I'm not sure how high LNET will go.  If you have an IB test
> cluster, you should be able to measure the upper limits by creating an
> OST on a loopback device on tmpfs, although you have to ensure the
> client-side cache is not skewing your results (hint: boot client with
> something like "mem=1g" to limit the ram they can use for the cache).
> 
> While the QDR IB link bandwidth is 4GB/s (or around 3.9GB/s with 2KB
> packets), the maximum HCA bandwidth is normally around 3.2GB/s
> (unidirectional), due to the PCIe overhead of breaking the transaction
> into (relatively) small packets and managing the packet flow
> control/credits.  This is independent of the protocol, and limited by
> the PCIe Gen2 x8 PCIe interface.  You will see somewhat higher bandwidth
> if your system supports and uses a 256 byte MaxPayload, rather than 128
> bytes.  Use lspci to see what your system is using, as in: "lspci -vv -d
> 15b3: | grep MaxPayload"
> 
> Kevin
> 
> Ashley Pittman wrote:
> > Hi,
> > 
> > Could anyone confirm to me the maximum achievable bandwidth over a single
> > 4xQDR IB link into a OSS node.  I have many clients doing a write test
> > over IB and want to know the maximum bandwidth we can expect to see for
> > each OSS node.  For MPI over these links we see between 3 and 3.5BG/s
> > but I suspect Lustre is capable of more than this because it's not using
> > DALP, is this correct?
> > 
> > Ashley.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Bernd Schubert
DataDirect Networks