[Lustre-discuss] Lustre with 10GbE or Infiniband?
Scott Atchley
atchley at myri.com
Wed Feb 11 13:35:47 PST 2009
On Feb 11, 2009, at 2:25 PM, Brian J. Murrell wrote:
> On Wed, 2009-02-11 at 11:08 -0800, Jeffrey Bennett wrote:
>> Hi,
>>
>> Has anybody done any performance comparison between Lustre with
>> 10GbE and Lustre with Infiniband 4X SDR? I wonder if they perform
>> similarly.
>
> While I don't have any performance numbers or experience for you, I
> will
> mention the differences in the way Lustre uses those two technologies.
>
> On 10GbE, Lustre (via it's sock LND) will use the TCP/IP stack on
> top of
> the ethernet stack. With Infiniband, we communicate directly with the
> I/B stack (via the o2ib LND) and take direct advantage of it's RDMA
> capabilities to achieve a very high percentage of wire speed.
>
> My gut feeling is that the overhead of TCP/IP carves some percentage
> out
> of your ability to achieve full wire speed.
>
> Maybe some others here, including our benchmarking folks here at Sun
> can
> provide some real world experiences and comparisons.
>
> b.
Jeffrey,
To add to Brian's comments, IB 4X SDR is limited to about 700-750 MB/s
by the fabric. O2IBLND cannot go faster than minimum of either the
fabric or PCI-E connection allow.
SOCKLND is limited by a copy on the receive side. When a client
writes, the server has to copy the data out. When a client reads, it
has to copy the data out. Because of this from a server's point-of-
view, multiple client read performance can scale with the number of
clients (the server is sending with zero-copy to multiple clients) and
can reach linerate. I did some tests a couple of years ago with
SOCKLND and our NICs:
http://wiki.lustre.org/index.php?title=Myri-10G_Ethernet
It shows a single server with 1 and 3 clients reading and writing.
When 3 clients read, it got very close to linerate.
Indiana University won the SC07 Bandwidth Challenge using Lustre over
the wide-area. They used SOCKLND with Myricom NICS and top-of-the-line
DDN storage. They saturated a 10 Gb/s link (sending and receiving
simultaneously), but I think it took a couple of DDN systems and
corresponding OSSes.
If your storage cannot exceed 700-750 MB/s, then either should work
for you.
Scott
More information about the lustre-discuss
mailing list