[Lustre-discuss] Max bandwidth through a single 4xQDR IB link?
Atul Vidwansa
Atul.Vidwansa at oracle.com
Wed Jun 30 21:20:26 PDT 2010
I would do following tests to see real life performance with QDR IB:
1. See what badwidth has been negotiated by IB HCA and your system using
ibv_devinfo
2. Use ib_rdma_bw and ib_send_bw between a pair of Lustre client and
server to see how much raw bandwidth you are getting.
3. Use lnet_selftest unidirectional (read OR write) and bidirectional
(read AND write) tests to see how much lnet can give you. See lustre
manual on using lnet_selftest
4. Benchmark your storage using sgpdd_survey or XDD
5. Run IOR or IOzone from "multiple" clients to see what throughput you
are getting. If you are interested in single client results, you can run
muti-threaded "dd" command from client on a Lustre filesystem.
Cheers,
_Atul
On 06/29/2010 07:45 PM, Bernd Schubert wrote:
> Hello Ashley, hello Kevin,
>
> I really see no point to use disks to benchmark performance, when
> lnet_selftest exists. Benchmark order should be:
>
> - test how much the disks can provide
> - test network with lnet_selftest
>
> => make sure lustre performance is not much below the
> min(disks, lnet_selftest)
>
>
> Cheers,
> Bernd
>
>
>
> On Tuesday, June 29, 2010, Kevin Van Maren wrote:
>
>> DAPL is a high-performance interface that uses a small shim to provide a
>> common DMA API on top of (in this case) the IB verbs layer. In general,
>> there is a very small performance impact to be able to use the common
>> API, so you will not get more large-message bandwidth using native IB
>> verbs.
>>
>> I've never had enough disk bandwidth behind a node to saturate a QDR IB
>> link, so I'm not sure how high LNET will go. If you have an IB test
>> cluster, you should be able to measure the upper limits by creating an
>> OST on a loopback device on tmpfs, although you have to ensure the
>> client-side cache is not skewing your results (hint: boot client with
>> something like "mem=1g" to limit the ram they can use for the cache).
>>
>> While the QDR IB link bandwidth is 4GB/s (or around 3.9GB/s with 2KB
>> packets), the maximum HCA bandwidth is normally around 3.2GB/s
>> (unidirectional), due to the PCIe overhead of breaking the transaction
>> into (relatively) small packets and managing the packet flow
>> control/credits. This is independent of the protocol, and limited by
>> the PCIe Gen2 x8 PCIe interface. You will see somewhat higher bandwidth
>> if your system supports and uses a 256 byte MaxPayload, rather than 128
>> bytes. Use lspci to see what your system is using, as in: "lspci -vv -d
>> 15b3: | grep MaxPayload"
>>
>> Kevin
>>
>> Ashley Pittman wrote:
>>
>>> Hi,
>>>
>>> Could anyone confirm to me the maximum achievable bandwidth over a single
>>> 4xQDR IB link into a OSS node. I have many clients doing a write test
>>> over IB and want to know the maximum bandwidth we can expect to see for
>>> each OSS node. For MPI over these links we see between 3 and 3.5BG/s
>>> but I suspect Lustre is capable of more than this because it's not using
>>> DALP, is this correct?
>>>
>>> Ashley.
>>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
More information about the lustre-discuss
mailing list