[lustre-discuss] mounting over ipoib via opa (was: mount issue and ecmp?)

Tue Feb 12 05:22:54 PST 2019

thanks for the suggestion, but i'm not sure it's applicable in my
scenario.  the storage and lnet routers do not have OPA cards
installed, it's only the clients.  the storage does have a mix of
mellanox, ethernet, and qdr hardware, but that's all working fine.  i
have multiple clusters connected to the storage, on all three
interconnects.

i tried setting the arp filters in the document, but it hasn't made
any difference.  i do have the opa tools from intel installed, but i
had tried this with the rhel bundled opa drivers as well and got the
same result.  i've tried building the lustre client with --o2ib=no,
same result.  i tried connected vs datagram mode on the ipoib
interface, same result.

if i wasn't able to lctl ping the storage devices from the client, i
would presume there's a network problem.  if i switch from ipoib to
the ethernet mgmt interfaces on the clients, i can mount lustre, which
should confirm and narrow it down to the ipoib interface specifically
and not anything with the network/routing.  and since i have a bevy of
other protocols running over ipoib (nfs/ssh/others) i'm pretty sure
that localizes the issue to something with lustre

if there's more debugging i can try i'm all ears

the one message i get from client in syslog is

Lustre: 253340:0:(client.c:2114:prlrpc_expire_one_request()) @@@
Request sent has timed out for slow reply: [sent 1549976603/real
1549976603] req at ffff9c1d4bf40300 x1625268266467424/t0(0)
o503->MGCxxx.xx.xx.xx at o2ib100@xxx.xx.xx.xx at o2ib100:26/25 lens 272/8416
e 0 to 1 dl 154997615 ref 2 fl Rpc:X/0/ffffffffff rc 0/-1

nothing gets reported on the MGS/MDS other then a client connection
restored message.

On Mon, Feb 11, 2019 at 3:15 PM Amir Shehata
<amir.shehata.whamcloud at gmail.com> wrote:
>
> If your routers have multiple OPA/MLX interfaces we found that linux routing can return the wrong HW address, which causes address resolution error.
>
> You can try the following linux routing config to see if it helps:
> https://wiki.whamcloud.com/display/LNet/MR+Cluster+Setup
>
> On Mon, 11 Feb 2019 at 12:04, Michael Di Domenico <mdidomenico4 at gmail.com> wrote:
>>
>> i've narrowed down that my issue seems to stem from running over ipoib
>> on an opa network
>>
>> i managed to pull all the routing and other things around so the only
>> difference was whether i road the ipoib or not
>>
>> when i mount via ethernet, it works fine
>>
>> when i try the same mount via ipoib running ontop of opa it gets
>> "input/output error".  i can however lctl ping the storage and i see
>> connections from the client to the MGS.  so some of the connectivity
>> is working, but it's breaking down somewhere else
>>
>> is anyone else running over ipoib on an opa network?  if so, do you
>> have lnet routing?
>>
>> some particulars
>>
>> rhel 7.6 clients
>> 2.10.5 clients
>> 2.5.x lustre servers (cray)
>> lnet routing between storage and other networks
>> currently running tcp ethernet, qdr infinipath, and fdr10 mellanox to
>> the storage through routers
>> no other machines are having mount issues
>>
>>
>> On Fri, Feb 8, 2019 at 9:33 AM Michael Di Domenico
>> <mdidomenico4 at gmail.com> wrote:
>> > poking at this further, it doesn't look like it's ECMP issue.
>> >
>> > Are there any known reports of issues when running Lustre over ipoib
>> > over an opa fabric?  seems a stretch, but it's the only difference in
>> > the network at this point.
>> >
>> > can anyone suggest somewhere to look for more debug info?
>> > /var/log/messages and dmesg, don't reveal much info
>> >
>> > On Mon, Feb 4, 2019 at 9:19 AM Michael Di Domenico
>> > <mdidomenico4 at gmail.com> wrote:
>> > >
>> > > Has anyone heard of lustre having trouble mounting when ECMP is used
>> > > on the compute nodes default gateway?
>> > >
>> > > I'm trying to mount an existing lustre filesystem on a new cluster,
>> > > where the connections ride over OPA IPoIB, which is then converted to
>> > > 10ge via four routers.  I'm using ECMP to distribute the packets over
>> > > the four routers.
>> > >
>> > > I can mount lustre on other ethernet clients, but not the ones behind
>> > > my ECMP gateways.  Changing the compute node gateway from ECMP to a
>> > > single device doesn't change anything.  I'm not easily able to revert
>> > > the network side from ECMP to a single route, so i haven't tried that.
>> > >
>> > > The output i get from mount is, "failed: Input/output error retries left: 0"
>> > >
>> > > syslog on the client and the MGS seem to show that the connection is
>> > > being broken between the MGS and client during the mount with a "timed
>> > > oout for slow reply" message.
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org