[lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

Peter Jones pjones at whamcloud.com
Fri Sep 4 05:26:44 PDT 2020


info at whamcloud.com


On 2020-09-04, 3:36 AM, "lustre-discuss on behalf of Angelos Ching" <lustre-discuss-bounces at lists.lustre.org on behalf of angelosching at clustertech.com> wrote:

    Hi Aurélien,
    
    May I have some pointers on to whom my account request for the Jira should be sent?
    
    Thanks,
    Angelos
    (Sent from mobile, please pardon me for typos and cursoriness.)
    
    > 2020/09/04 16:01、Degremont, Aurelien <degremoa at amazon.com>のメール:
    > 
    > Hi Angelos,
    > 
    > Bug reports could be made at  https://jira.whamcloud.com/
    > 
    > 
    > Aurélien
    > 
    > Le 04/09/2020 06:11, « lustre-discuss au nom de Angelos Ching » <lustre-discuss-bounces at lists.lustre.org au nom de angelosching at clustertech.com> a écrit :
    > 
    >    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    > 
    > 
    > 
    >    Dear all,
    > 
    >    I think I've encountered a bug in lnetctl but not sure where to submit a
    >    bug report:
    > 
    >    Summary:
    >    It's expected that the Lnet config on a node can be recreated on
    >    lnet.service start up by saving the config using: lnetctl export
    >    --backup > /etc/lnet.conf
    >    But ordering within ymal file causes extraneous NIDs to be created when
    >    used in combination with routing, thus breaking Lnet routing / node
    >    communication, with server side dmesg showing "Bad dest nid n.n.n.n at o2ib
    >    (it's my nid but on a different network)"
    > 
    >    Environment:
    >    Client: CentOS 7.8, Lustre 2.12.5-ib, MLNX OFED 4.9-0.1.7.1
    >    Lnet router + server: CentOS 7.7, Lustre 2.12.4-ib, MLNX OFED 4.7-3.2.9.0
    > 
    >    Steps to reproduce:
    >    (Listing 1) Server side Lnet config (peer list omitted for conciseness):
    >    https://pastebin.com/DH6HAt5a
    >    (Listing 2) Full command listing and output on client side is reproduced
    >    here: https://pastebin.com/h3wHyCM7
    > 
    >    All steps below carried out on Lustre client:
    > 
    >    1. Restart lnet service with empty /etc/lnet.conf
    >    2. lnetctl net add: TCP network using Ethernet
    >    3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs
    >    4. lnetctl route add: 2 gateways to o2ib network using "Lnet router +
    >    server"@TCP NID
    >    5. lnetctl export: with --backup to /etc/lnet.conf; check the saved file
    >    and confirm Lnet is configured with 2 peers and 2 gateways (Listing 2:
    >    37-47)
    >    6. Mount o2ib exported Lustre volume and confirm volume functioning
    >    correctly; unmount volume
    >    7. Restart lnet.service and check lnet configuration; finds 2 extra peer
    >    entries that reference only TCP NID of the "Lnet router + server" along
    >    with 2 manually configured peers that reference both o2ib and tcp NIDs
    >    (Listing 2: 75-93)
    >    8. Client fails to mount o2ib exported volume; server side kernel
    >    message shows "Bad dest nid n.n.n.n at o2ib (it's my nid but on a different
    >    network)"
    > 
    >    9. If we reorder the peer list to go before the route list in
    >    /etc/lnet.conf (Listing 2: 16), then lnet would be properly configured
    >    with 2 peers on service restart and everything works as expected.
    > 
    >    Best regards,
    > 
    >    --
    >    Angelos Ching
    >    ClusterTech Limited
    > 
    >    Tel     : +852-2655-6138
    >    Fax     : +852-2994-2101
    >    Address : Unit 211-213, Lakeside 1, 8 Science Park West Ave., Shatin, Hong Kong
    > 
    >    Got praises or room for improvements? http://bit.ly/TellAngelos
    > 
    >    ********************************************************************************
    >    The information contained in this e-mail and its attachments is confidential and
    >    intended solely for the specified addressees. If you have received this email in
    >    error, please do not read, copy, distribute, disclose or use any information of
    >    this email in any way and please immediately notify the sender and delete this
    >    email. Thank you for your cooperation.
    >    ********************************************************************************
    > 
    >    _______________________________________________
    >    lustre-discuss mailing list
    >    lustre-discuss at lists.lustre.org
    >    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    > 
    
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    



More information about the lustre-discuss mailing list