[lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf
Angelos Ching
angelosching at clustertech.com
Fri Sep 4 03:35:55 PDT 2020
Hi Aurélien,
May I have some pointers on to whom my account request for the Jira should be sent?
Thanks,
Angelos
(Sent from mobile, please pardon me for typos and cursoriness.)
> 2020/09/04 16:01、Degremont, Aurelien <degremoa at amazon.com>のメール:
>
> Hi Angelos,
>
> Bug reports could be made at https://jira.whamcloud.com/
>
>
> Aurélien
>
> Le 04/09/2020 06:11, « lustre-discuss au nom de Angelos Ching » <lustre-discuss-bounces at lists.lustre.org au nom de angelosching at clustertech.com> a écrit :
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Dear all,
>
> I think I've encountered a bug in lnetctl but not sure where to submit a
> bug report:
>
> Summary:
> It's expected that the Lnet config on a node can be recreated on
> lnet.service start up by saving the config using: lnetctl export
> --backup > /etc/lnet.conf
> But ordering within ymal file causes extraneous NIDs to be created when
> used in combination with routing, thus breaking Lnet routing / node
> communication, with server side dmesg showing "Bad dest nid n.n.n.n at o2ib
> (it's my nid but on a different network)"
>
> Environment:
> Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
> Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 4.7-3.2.9.0
>
> Steps to reproduce:
> (Listing 1) Server side Lnet config (peer list omitted for conciseness):
> https://pastebin.com/DH6HAt5a
> (Listing 2) Full command listing and output on client side is reproduced
> here: https://pastebin.com/h3wHyCM7
>
> All steps below carried out on Lustre client:
>
> 1. Restart lnet service with empty /etc/lnet.conf
> 2. lnetctl net add: TCP network using Ethernet
> 3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
> 4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
> server"@TCP NID
> 5. lnetctl export: with --backup to /etc/lnet.conf; check the saved file
> and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2:
> 37-47)
> 6. Mount o2ib exported Lustre volume and confirm volume functioning
> correctly; unmount volume
> 7. Restart lnet.service and check lnet configuration; finds 2 extra peer
> entries that reference only TCP NID of the "Lnet router + server" along
> with 2 manually configured peers that reference both o2ib and tcp NIDs
> (Listing 2: 75-93)
> 8. Client fails to mount o2ib exported volume; server side kernel
> message shows "Bad dest nid n.n.n.n at o2ib (it's my nid but on a different
> network)"
>
> 9. If we reorder the peer list to go before the route list in
> /etc/lnet.conf (Listing 2: 16), then lnet would be properly configured
> with 2 peers on service restart and everything works as expected.
>
> Best regards,
>
> --
> Angelos Ching
> ClusterTech Limited
>
> Tel : +852-2655-6138
> Fax : +852-2994-2101
> Address : Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, Hong Kong
>
> Got praises or room for improvements? http://bit.ly/TellAngelos
>
> ********************************************************************************
> The information contained in this e-mail and its attachments is confidential and
> intended solely for the specified addressees. If you have received this email in
> error, please do not read, copy, distribute, disclose or use any information of
> this email in any way and please immediately notify the sender and delete this
> email. Thank you for your cooperation.
> ********************************************************************************
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
More information about the lustre-discuss
mailing list