[lustre-discuss] Lnet config serving multiple routers and clients

Kumar, Amit ahkumar at mail.smu.edu
Wed Apr 5 10:39:45 PDT 2023


Yes I do but in the same state as server it is down. Also I have set routing to 1 on lnet;

# lnetctl route show -v
route:
    - net: o2ib
      gateway: 10.215.25.76 at o2ib2
      hop: -1
      priority: 0
      health_sensitivity: 1
      state: down
      type: single-hop

From: Horn, Chris <chris.horn at hpe.com>
Sent: Wednesday, April 5, 2023 12:33 PM
To: Kumar, Amit <ahkumar at mail.smu.edu>; lustre-discuss at lists.lustre.org
Subject: Re: Lnet config serving multiple routers and clients


[EXTERNAL SENDER]
Do you have the route to o2ib via 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2> defined on the client?

Chris Horn

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org>> on behalf of Kumar, Amit via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>
Date: Wednesday, April 5, 2023 at 12:28 PM
To: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org> <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>
Subject: [lustre-discuss] Lnet config serving multiple routers and clients

Dear Lustre team,



Below lustre server(showing one of many) is already serving current file system via lnet routers over o2ib1 to another cluster;



Now we are adding a new router to serve another new cluster and its clients over o2ib2;



Apparently, I can communicate via ping to lnet router's both o2ib and o2ib2 NIDs; Likewise, from client I can ping both o2ib and o2ib2 NIDs on lnet router. But end to end communication between client and server cannot find route to each other.



Initially I thought it me be related to LU-11641, given I can access both NIDs on the immediate peer I am guessing it is something in my config. I wanted to see if a second set of eyes could point out what could I be doing wrong. Any idea?



Server is @ lustre-2.12.5-1.el7.x86_64 on CentOS7.8;

Lnet router is @ lustre-client-2.12.5-1.el7.x86_64 on CentOS7.8



Client is @lustre-client-2.14.0 on ****Ubuntu 22.04***;



Server(10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>)

Lnet router(10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib> & 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>)

Client (10.215.25.74 at o2ib2<mailto:10.215.25.74 at o2ib2>)



********** Server: ******************

# lnetctl net show

net:

    - net type: lo

      local NI(s):

        - nid: 0 at lo

          status: up

    - net type: o2ib

      local NI(s):

        - nid: 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

          status: up

          interfaces:

              0: ib0



# lnetctl route show

route:

    - net: o2ib1

      gateway: 10.212.15.16 at o2ib<mailto:10.212.15.16 at o2ib>

    - net: o2ib1

      gateway: 10.212.16.20 at o2ib<mailto:10.212.16.20 at o2ib>

    - net: o2ib2

      gateway: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>



# lnetctl ping 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

ping:

    - primary nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>



# lnetctl ping 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

ping:

    - primary nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>



# lnetctl ping 10.215.25.74 at o2ib2<mailto:10.215.25.74 at o2ib2>

manage:

    - ping:

          errno: -1

          descr: failed to ping 10.215.25.74 at o2ib2<mailto:10.215.25.74 at o2ib2>: Input/output error



************LNET router ****************

# lnetctl net show

net:

    - net type: lo

      local NI(s):

        - nid: 0 at lo

          status: up

    - net type: o2ib

      local NI(s):

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

          status: up

          interfaces:

              0: ib1

    - net type: o2ib2

      local NI(s):

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

          status: down

          interfaces:

              0: ib0





# lnetctl discover 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

discover:

    - primary nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>



# lnetctl discover 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

discover:

    - primary nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>



# lnetctl ping 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

ping:

    - primary nid: 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>



# lnetctl ping 10.215.25.75 at o2ib2<mailto:10.215.25.75 at o2ib2>

ping:

    - primary nid: 10.215.25.75 at o2ib2<mailto:10.215.25.75 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.215.25.75 at o2ib2<mailto:10.215.25.75 at o2ib2>



# lnetctl peer show

peer:

    - primary nid: 10.215.25.74 at o2ib2<mailto:10.215.25.74 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.215.25.74 at o2ib2<mailto:10.215.25.74 at o2ib2>

          state: up

    - primary nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

          state: up

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

          state: up

    - primary nid: 10.215.25.75 at o2ib2<mailto:10.215.25.75 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.215.25.75 at o2ib2<mailto:10.215.25.75 at o2ib2>

          state: up

    - primary nid: 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

          state: up



************* Client *****************

# lnetctl net show

net:

    - net type: lo

      local NI(s):

        - nid: 0 at lo

          status: up

    - net type: o2ib2

      local NI(s):

        - nid: 10.215.25.74 at o2ib2<mailto:10.215.25.74 at o2ib2>

          status: up

          interfaces:

              0: ibp129s0f1





# lnetctl discover 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

discover:

    - primary nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>





# lnetctl ping 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

ping:

    - primary nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>





# lnetctl ping 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

ping:

    - primary nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>

      Multi-Rail: True

      peer ni:

        - nid: 10.212.1.11 at o2ib<mailto:10.212.1.11 at o2ib>

        - nid: 10.215.25.76 at o2ib2<mailto:10.215.25.76 at o2ib2>



# lnetctl ping 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

manage:

    - ping:

          errno: -1

          descr: failed to ping 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>: Input/output error



root at lnet1:~# lnetctl discover 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>

manage:

    - discover:

          errno: -1

          descr: failed to discover 10.212.14.9 at o2ib<mailto:10.212.14.9 at o2ib>: No route to host







Thank you,

Amit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230405/78ba92bf/attachment-0001.htm>


More information about the lustre-discuss mailing list