[lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf
angelosching at clustertech.com
Sun Sep 6 06:44:15 PDT 2020
September 5, 2020 1:04 AM, "Mohr Jr, Richard Frank" <rmohr at utk.edu> wrote:
> So your server has both tcp and o2ib NIDs, and you
> want the server to route requests from tcp clients to other resources on the o2ib network. But when
> you mount Lustre, you want the client to use the server’s o2ib NID instead of mounting with the
> server’s tcp NID.
Actually the pair of Lnet router themselves are also serving MDS & OSS, and with 4 more MDS/OSS that are only on o2ib serving yet another file system. With the extraneous peer added by route add, the Lnet router would print the follow kernel message:
> LNetError: 34250:0:(lib-move.c:4259:lnet_parse()) 10.4.7.145 at tcp, src 10.4.7.145 at tcp: Bad dest nid 10.1.4.24 at o2ib (it's my nid but on a different network)
This is worked around by manually adding the routers as peers with the 2 NIDs prior to route add, whether o2ib or tcp is used as primary NID does not seems to matter; and I just discovered that if I perform a lnetctl discover with the router's TCP NID, either before or after route add, that would also yield a usable Lnet. After discovering the later workaround, I've implemented it using a systemd drop-in for lnet.service unit.
E: angelosching at clustertech.com
A: 210-213, Lake Side 1, Science Park, Hong Kong
More information about the lustre-discuss