[lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

Angelos Ching angelosching at clustertech.com
Thu Sep 3 21:11:04 PDT 2020


Dear all,

I think I've encountered a bug in lnetctl but not sure where to submit a 
bug report:

Summary:
It's expected that the Lnet config on a node can be recreated on 
lnet.service start up by saving the config using: lnetctl export 
--backup > /etc/lnet.conf
But ordering within ymal file causes extraneous NIDs to be created when 
used in combination with routing, thus breaking Lnet routing / node 
communication, with server side dmesg showing "Bad dest nid n.n.n.n at o2ib 
(it's my nid but on a different network)"

Environment:
Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 4.7-3.2.9.0

Steps to reproduce:
(Listing 1) Server side Lnet config (peer list omitted for conciseness): 
https://pastebin.com/DH6HAt5a
(Listing 2) Full command listing and output on client side is reproduced 
here: https://pastebin.com/h3wHyCM7

All steps below carried out on Lustre client:

1. Restart lnet service with empty /etc/lnet.conf
2. lnetctl net add: TCP network using Ethernet
3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
4. lnetctl route add: 2 gateways to o2ib network using "Lnet router + 
server"@TCP NID
5. lnetctl export: with --backup to /etc/lnet.conf; check the saved file 
and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2: 
37-47)
6. Mount o2ib exported Lustre volume and confirm volume functioning 
correctly; unmount volume
7. Restart lnet.service and check lnet configuration; finds 2 extra peer 
entries that reference only TCP NID of the "Lnet router + server" along 
with 2 manually configured peers that reference both o2ib and tcp NIDs 
(Listing 2: 75-93)
8. Client fails to mount o2ib exported volume; server side kernel 
message shows "Bad dest nid n.n.n.n at o2ib (it's my nid but on a different 
network)"

9. If we reorder the peer list to go before the route list in 
/etc/lnet.conf (Listing 2: 16), then lnet would be properly configured 
with 2 peers on service restart and everything works as expected.

Best regards,

-- 
Angelos Ching
ClusterTech Limited

Tel     : +852-2655-6138
Fax     : +852-2994-2101
Address	: Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, Hong Kong

Got praises or room for improvements? http://bit.ly/TellAngelos

********************************************************************************
The information contained in this e-mail and its attachments is confidential and
intended solely for the specified addressees. If you have received this email in
error, please do not read, copy, distribute, disclose or use any information of
this email in any way and please immediately notify the sender and delete this
email. Thank you for your cooperation.
********************************************************************************



More information about the lustre-discuss mailing list