[lustre-discuss] weird issue w. lnet routers

John Casu john at chiraldynamics.com
Wed Nov 29 16:15:52 PST 2017


thanks guys for all your help.

looks like the issue is fundamentally poor performance across 100GbE, where I'm only
getting ~50Gb/s using iperf. I believe the MTU is set correctly across all my systems
Using connectx-4 in 100GbE mode.

thanks again,
-john



On 11/28/17 9:03 PM, Colin Faber wrote:
> Are peer credits set appropriately across your fabric?
> 
> On Nov 28, 2017 8:40 PM, "john casu" <john at chiraldynamics.com <mailto:john at chiraldynamics.com>> wrote:
> 
>     Thanks,
>     just about to try that MTU setting.
> 
>     It's a small lustre system... 2 routers, MDS/MGS pair, OSS pair, JBOD pair (112 drives for OST)
>     and yes, routing between EDR & 100GbE
> 
>     -john
> 
>     On 11/28/17 7:28 PM, Raj wrote:
> 
>         John, increasing MTU size on Ethernet side should increase the b/w. I also have a feeling that some lnet routers and/or
>         intermediate switches/routers does not have jumbo frame turned on (some switches needs to be set at 9212 bytes ).
>         How many LNet  routers are you using? I believe you are routing between EDR IB and 100GbE.
> 
> 
>         On Tue, Nov 28, 2017 at 7:21 PM John Casu <john at chiraldynamics.com <mailto:john at chiraldynamics.com> <mailto:john at chiraldynamics.com <mailto:john at chiraldynamics.com>>> wrote:
> 
>              just built a system w. lnet routers that bridge Infiniband & 100GbE, using Centos built in Infiniband support
>              servers are Infiniband, clients are 100GbE (connectx-4 cards)
> 
>              my direct write performance from clients over Infiniband is around 15GB/s
> 
>              When I introduced the lnet routers, performance dropped to 10GB/s
> 
>              Thought the problem was an MTU of 1500, but when I changed the MTUs to 9000
>              performance dropped to 3GB/s.
> 
>              When I tuned according to John Fragella's LUG slides, things went even slower (1.5GB/s write)
> 
>              does anyone have any ideas on what I'm doing wrong??
> 
>              thanks,
>              -john c.
> 
>              _______________________________________________
>              lustre-discuss mailing list
>         lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org> <mailto:lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>>
>         http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> 
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> 


More information about the lustre-discuss mailing list