[lustre-discuss] mount issue and ecmp?
Michael Di Domenico
mdidomenico4 at gmail.com
Mon Feb 4 06:19:48 PST 2019
Has anyone heard of lustre having trouble mounting when ECMP is used
on the compute nodes default gateway?
I'm trying to mount an existing lustre filesystem on a new cluster,
where the connections ride over OPA IPoIB, which is then converted to
10ge via four routers. I'm using ECMP to distribute the packets over
the four routers.
I can mount lustre on other ethernet clients, but not the ones behind
my ECMP gateways. Changing the compute node gateway from ECMP to a
single device doesn't change anything. I'm not easily able to revert
the network side from ECMP to a single route, so i haven't tried that.
The output i get from mount is, "failed: Input/output error retries left: 0"
syslog on the client and the MGS seem to show that the connection is
being broken between the MGS and client during the mount with a "timed
oout for slow reply" message.
More information about the lustre-discuss
mailing list