[Lustre-discuss] Max bandwidth through a single 4xQDR IB link?

Wed Jun 30 21:20:26 PDT 2010

I would do following tests to see real life performance with QDR IB:

1. See what badwidth has been negotiated by IB HCA and your system using 
ibv_devinfo
2. Use ib_rdma_bw and ib_send_bw between a pair of Lustre client and 
server to see how much raw bandwidth you are getting.
3. Use lnet_selftest unidirectional (read OR write) and bidirectional 
(read AND write) tests to see how much lnet can give you. See lustre 
manual on using lnet_selftest
4. Benchmark your storage using sgpdd_survey or XDD
5. Run IOR or IOzone from "multiple" clients to see what throughput you 
are getting. If you are interested in single client results, you can run 
muti-threaded "dd" command from client on a Lustre filesystem.

Cheers,
_Atul

On 06/29/2010 07:45 PM, Bernd Schubert wrote:
> Hello Ashley, hello Kevin,
>
> I really see no point to use disks to benchmark performance, when
> lnet_selftest exists. Benchmark order should be:
>
> - test how much the disks can provide
> - test network with lnet_selftest
>
> =>  make sure lustre performance is not much below the
>     min(disks, lnet_selftest)
>
>
> Cheers,
> Bernd
>
>
>
> On Tuesday, June 29, 2010, Kevin Van Maren wrote:
>    
>> DAPL is a high-performance interface that uses a small shim to provide a
>> common DMA API on top of (in this case) the IB verbs layer.  In general,
>> there is a very small performance impact to be able to use the common
>> API, so you will not get more large-message bandwidth using native IB
>> verbs.
>>
>> I've never had enough disk bandwidth behind a node to saturate a QDR IB
>> link, so I'm not sure how high LNET will go.  If you have an IB test
>> cluster, you should be able to measure the upper limits by creating an
>> OST on a loopback device on tmpfs, although you have to ensure the
>> client-side cache is not skewing your results (hint: boot client with
>> something like "mem=1g" to limit the ram they can use for the cache).
>>
>> While the QDR IB link bandwidth is 4GB/s (or around 3.9GB/s with 2KB
>> packets), the maximum HCA bandwidth is normally around 3.2GB/s
>> (unidirectional), due to the PCIe overhead of breaking the transaction
>> into (relatively) small packets and managing the packet flow
>> control/credits.  This is independent of the protocol, and limited by
>> the PCIe Gen2 x8 PCIe interface.  You will see somewhat higher bandwidth
>> if your system supports and uses a 256 byte MaxPayload, rather than 128
>> bytes.  Use lspci to see what your system is using, as in: "lspci -vv -d
>> 15b3: | grep MaxPayload"
>>
>> Kevin
>>
>> Ashley Pittman wrote:
>>      
>>> Hi,
>>>
>>> Could anyone confirm to me the maximum achievable bandwidth over a single
>>> 4xQDR IB link into a OSS node.  I have many clients doing a write test
>>> over IB and want to know the maximum bandwidth we can expect to see for
>>> each OSS node.  For MPI over these links we see between 3 and 3.5BG/s
>>> but I suspect Lustre is capable of more than this because it's not using
>>> DALP, is this correct?
>>>
>>> Ashley.
>>>        
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>      
>
>