[lustre-discuss] weird issue w. lnet routers
jfragalla at cray.com
Tue Nov 28 17:42:35 PST 2017
Hi John C,
This is interesting. When you changed MTU size, did you change it end
to end, including the switches? If any path does not have Jumbo frames
enabled, it will revert back 1500.
Did you tune the server, routers, and clients with the same lnet
params? Did you tune lustre clients regarding max rpcs, dirty_mb, LRU,
checksums, max_pages_per_rpc, etc?
On the switch, MTU should be set to max, bigger than 9000, to
accommodate for payload size coming from the nodes.
Senior Storage Engineer
High Performance Computing
jfragalla at cray.com <mailto:jfragalla at cray.com>
On 11/28/17 5:21 PM, John Casu wrote:
> just built a system w. lnet routers that bridge Infiniband & 100GbE,
> using Centos built in Infiniband support
> servers are Infiniband, clients are 100GbE (connectx-4 cards)
> my direct write performance from clients over Infiniband is around 15GB/s
> When I introduced the lnet routers, performance dropped to 10GB/s
> Thought the problem was an MTU of 1500, but when I changed the MTUs to
> performance dropped to 3GB/s.
> When I tuned according to John Fragella's LUG slides, things went even
> slower (1.5GB/s write)
> does anyone have any ideas on what I'm doing wrong??
> -john c.
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-discuss