[lustre-discuss] mount issue and ecmp?

Michael Di Domenico mdidomenico4 at gmail.com
Fri Feb 8 06:33:55 PST 2019


poking at this further, it doesn't look like it's ECMP issue.

Are there any known reports of issues when running Lustre over ipoib
over an opa fabric?  seems a stretch, but it's the only difference in
the network at this point.

can anyone suggest somewhere to look for more debug info?
/var/log/messages and dmesg, don't reveal much info




On Mon, Feb 4, 2019 at 9:19 AM Michael Di Domenico
<mdidomenico4 at gmail.com> wrote:
>
> Has anyone heard of lustre having trouble mounting when ECMP is used
> on the compute nodes default gateway?
>
> I'm trying to mount an existing lustre filesystem on a new cluster,
> where the connections ride over OPA IPoIB, which is then converted to
> 10ge via four routers.  I'm using ECMP to distribute the packets over
> the four routers.
>
> I can mount lustre on other ethernet clients, but not the ones behind
> my ECMP gateways.  Changing the compute node gateway from ECMP to a
> single device doesn't change anything.  I'm not easily able to revert
> the network side from ECMP to a single route, so i haven't tried that.
>
> The output i get from mount is, "failed: Input/output error retries left: 0"
>
> syslog on the client and the MGS seem to show that the connection is
> being broken between the MGS and client during the mount with a "timed
> oout for slow reply" message.


More information about the lustre-discuss mailing list