[lustre-discuss] mount issue and ecmp?

Michael Di Domenico mdidomenico4 at gmail.com
Mon Feb 4 06:19:48 PST 2019


Has anyone heard of lustre having trouble mounting when ECMP is used
on the compute nodes default gateway?

I'm trying to mount an existing lustre filesystem on a new cluster,
where the connections ride over OPA IPoIB, which is then converted to
10ge via four routers.  I'm using ECMP to distribute the packets over
the four routers.

I can mount lustre on other ethernet clients, but not the ones behind
my ECMP gateways.  Changing the compute node gateway from ECMP to a
single device doesn't change anything.  I'm not easily able to revert
the network side from ECMP to a single route, so i haven't tried that.

The output i get from mount is, "failed: Input/output error retries left: 0"

syslog on the client and the MGS seem to show that the connection is
being broken between the MGS and client during the mount with a "timed
oout for slow reply" message.


More information about the lustre-discuss mailing list