[lustre-discuss] weird issue w. lnet routers

John Fragalla jfragalla at cray.com
Tue Nov 28 17:42:35 PST 2017


Hi John C,

This is interesting.  When you changed MTU size, did you change it end 
to end, including the switches?  If any path does not have Jumbo frames 
enabled, it will revert back 1500.

Did you tune the server, routers, and clients with the same lnet 
params?  Did you tune lustre clients regarding max rpcs, dirty_mb, LRU, 
checksums, max_pages_per_rpc, etc?

On the switch, MTU should be set to max, bigger than 9000, to 
accommodate for payload size coming from the nodes.

Thanks.

jnf

--
John Fragalla
Senior Storage Engineer
High Performance Computing
Cray Inc.
jfragalla at cray.com <mailto:jfragalla at cray.com>
+1-951-258-7629

On 11/28/17 5:21 PM, John Casu wrote:
> just built a system w. lnet routers that bridge Infiniband & 100GbE, 
> using Centos built in Infiniband support
> servers are Infiniband, clients are 100GbE (connectx-4 cards)
>
> my direct write performance from clients over Infiniband is around 15GB/s
>
> When I introduced the lnet routers, performance dropped to 10GB/s
>
> Thought the problem was an MTU of 1500, but when I changed the MTUs to 
> 9000
> performance dropped to 3GB/s.
>
> When I tuned according to John Fragella's LUG slides, things went even 
> slower (1.5GB/s write)
>
> does anyone have any ideas on what I'm doing wrong??
>
> thanks,
> -john c.
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171128/ee27e008/attachment-0001.html>


More information about the lustre-discuss mailing list