[lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf
angelosching at clustertech.com
Fri Sep 4 08:26:11 PDT 2020
If I don't add the "Lnet router + Server" peers manually as multi-rail enabled peer before route add, a non-multi-rail
peer with only TCP NID would be added by the route add command for the "Lnet router + Server" (as seen in line 76-83 in https://pastebin.com/h3wHyCM7) and the existent of those 2 peers would interfere with normal Lnet communication with server side kernel message printing "Bad dest nid n.n.n.n at o2ib (it's my nid but on a different network)"
This is also what happens when lnet.conf is imported by lnetctl: if lnetctl imports the peer before route, no extraneous peer entries were created and everything works as expected (as output by line 16). If lnetctl import the route before peer, the scenario mentioned in the last paragraph occurs and results in a non-usable Lnet for the client. And the order lnetctl import each section depends on its order of appearance inside the yaml file.
E: angelosching at clustertech.com
A: 210-213, Lake Side 1, Science Park, Hong Kong
September 4, 2020 11:06 PM, "Mohr Jr, Richard Frank" <rmohr at utk.edu> wrote:
>> On Sep 4, 2020, at 12:11 AM, Angelos Ching <angelosching at clustertech.com> wrote:
>> All steps below carried out on Lustre client:
>> 1. Restart lnet service with empty /etc/lnet.conf
>> 2. lnetctl net add: TCP network using Ethernet
>> 3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
> The commands you ran were:
> [root at access2 ~]# lnetctl peer add --nid 10.1.4.24 at o2ib,10.4.7.24 at tcp
> [root at access2 ~]# lnetctl peer add --nid 10.1.4.25 at o2ib,10.4.7.25 at tcp
> Commands like this can be used when a node has a multirail setup, like when a node has multiple
> interfaces on the same network. But for your routers, it looks like the tcp network is available to
> the client and the o2ib network is available to the server. Since those interfaces are not on the
> same network so you don’t need to add both of them as a peer.
>> 4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
>> server"@TCP NID
> [root at access2 ~]# lnetctl route add --net o2ib --gateway 10.4.7.24 at tcp
> [root at access2 ~]# lnetctl route add --net o2ib --gateway 10.4.7.25 at tcp
> These should be the only commands you need to run to configure your routing.
More information about the lustre-discuss