[Lustre-discuss] Max bandwidth through a single 4xQDR IB link?

Tue Jun 29 06:57:17 PDT 2010

DAPL is a high-performance interface that uses a small shim to provide a 
common DMA API on top of (in this case) the IB verbs layer.  In general, 
there is a very small performance impact to be able to use the common 
API, so you will not get more large-message bandwidth using native IB verbs.

I've never had enough disk bandwidth behind a node to saturate a QDR IB 
link, so I'm not sure how high LNET will go.  If you have an IB test 
cluster, you should be able to measure the upper limits by creating an 
OST on a loopback device on tmpfs, although you have to ensure the 
client-side cache is not skewing your results (hint: boot client with 
something like "mem=1g" to limit the ram they can use for the cache).

While the QDR IB link bandwidth is 4GB/s (or around 3.9GB/s with 2KB 
packets), the maximum HCA bandwidth is normally around 3.2GB/s 
(unidirectional), due to the PCIe overhead of breaking the transaction 
into (relatively) small packets and managing the packet flow 
control/credits.  This is independent of the protocol, and limited by 
the PCIe Gen2 x8 PCIe interface.  You will see somewhat higher bandwidth 
if your system supports and uses a 256 byte MaxPayload, rather than 128 
bytes.  Use lspci to see what your system is using, as in: "lspci -vv -d 
15b3: | grep MaxPayload"

Kevin

Ashley Pittman wrote:
> Hi,
>
> Could anyone confirm to me the maximum achievable bandwidth over a single 4xQDR IB link into a OSS node.  I have many clients doing a write test over IB and want to know the maximum bandwidth we can expect to see for each OSS node.  For MPI over these links we see between 3 and 3.5BG/s but I suspect Lustre is capable of more than this because it's not using DALP, is this correct?
>
> Ashley.
>
>