[lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA

Ben Evans bevans at cray.com
Fri Jun 19 09:54:08 PDT 2015


I’d put a set of lnet gateways, possibly mount the FS as NFS or CIFS in one or two places if there is some need to access it from ‘outside’.

If it’s something like Corporate IT or Security demanding that everything be homogenous, find some way of charging them for the slowdowns you’ll have.  Also note that you’ll have some really weird issues if someone starts running portscanners against Lustre.

-Ben Evans

From: Jeff Johnson [mailto:jeff.johnson at aeoncomputing.com]
Sent: Friday, June 19, 2015 12:50 PM
To: INKozin
Cc: Ben Evans; lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA

Why choose? Why not install a lnet router QDR<->10GbE or dual home your MDS & OSS nodes with QDR and a 10GbE nic?

--Jeff

On Fri, Jun 19, 2015 at 9:10 AM, INKozin <i.n.kozin at googlemail.com<mailto:i.n.kozin at googlemail.com>> wrote:
I know that QDR IB gives the best bang for buck currently and that's what we have now. However due to various reasons we are looking at alternatives hence the question. Thank you very much for your information, Ben.

On 19 June 2015 at 16:24, Ben Evans <bevans at cray.com<mailto:bevans at cray.com>> wrote:
It’s faster in that you eliminate all the TCP overhead and latency. (something on the order of 20% improvement in speed, IIRC, it’s been several years)

Balancing your network performance with what your disks can provide is a whole other level of system design and implementation.  You can stack enough disks or SSDs behind a server so that the network is your bottleneck, you can stack up enough network to few enough disks so that the drives are your bottleneck.  You can stack up enough of both so that the PCIE bus is your bottleneck.

Take the time and compare costs/performance to Infiniband, since most systems have a dedicated client/server network, you might as well go as fast as you can.

-Ben Evans

From: igko50 at gmail.com<mailto:igko50 at gmail.com> [mailto:igko50 at gmail.com<mailto:igko50 at gmail.com>] On Behalf Of INKozin
Sent: Friday, June 19, 2015 11:10 AM
To: Ben Evans
Cc: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA

Ben, is it possible to quantify "faster"?
Understandably, for a single client on an empty cluster it may feel "faster" but on a busy cluster with many reads and writes in flight I'd have thought the limiting factor is the back end's throughput rather than the network, no? As long as the bandwidth to a client is somewhat higher than the average i/o bandwidth (back end's throughput divided by the number of clients) the client should be content.

On 19 June 2015 at 14:46, Ben Evans <bevans at cray.com<mailto:bevans at cray.com>> wrote:
It is faster, but I don’t know what price/performance tradeoff is, as I only used it as an engineer.

As an alternative, take a look at RoCE, it does much the same thing but uses normal (?) hardware.  It’s still pretty new, though, so you might have some speedbumps.

-Ben Evans

From: lustre-discuss [mailto:lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org>] On Behalf Of INKozin
Sent: Friday, June 19, 2015 5:43 AM
To: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Lustre over 10 Gb Ethernet with and without RDMA

My question is about performance advantages of Lustre RDMA over 10 Gb Ethernet. When using 10 Gb Ethernet to build Lustre, is it worth paying the premium for iWARP? I understand that iWARP essentially reduces latency but less sure of its specific implications for storage. Would it improve performance on small files? Any pointers to representative benchmarks will be very appreciated.

Celsio has released a white paper in which they compare Lustre RDMA over 40 Gb Ethernet and FDR IB
http://www.chelsio.com/wp-content/uploads/resources/Lustre-Over-iWARP-vs-IB-FDR.pdf
where they claim comparable performance of both.
How much worse the throughput on small block sizes would be without iWARP?

Thank you
Igor



_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson at aeoncomputing.com<mailto:jeff.johnson at aeoncomputing.com>
www.aeoncomputing.com<http://www.aeoncomputing.com>
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150619/46d95374/attachment-0001.htm>


More information about the lustre-discuss mailing list