[lustre-discuss] Updating mgsnode IP command completes successfully, but old IP remains

Ricardo Brugman rbrugman at neoleukin.com
Tue Dec 7 11:02:30 PST 2021


Thank you Thomas and Nathan for your responses.

Some more information regarding the setup:

Lustre version: 2.12.0

Two Lustre nodes each consisting of four InfiniBand interfaces (NIDs) and there’s only one mgs, which is running on the first Lustre node.

The four NIDs relate back to the four IPs listed in the previously shared mgsnode syntax (i.e. .201, .202, etc.) of the first Lustre node so although it’s not a separate failover partner it is at least a separate IB interface.

There is no failover in regard to the MGS.

@Thomas, I did not come across the --servicenode syntax in the information that I found, but I’ll look into this and use it for the new virtualized Lustre environment I’m building.

Thanks again for your help and insights,

Ricardo


From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Nathan Dauchy - NOAA Affiliate via lustre-discuss <lustre-discuss at lists.lustre.org>
Date: Monday, December 6, 2021 at 7:25 AM
To: lustre-discuss <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] Updating mgsnode IP command completes successfully, but old IP remains
CAUTION: External Sender.

________________________________
Ricardo,

Your --mgsnode specification with all commas implies that you have four NIDs on a single host. But the rest of your writeup indicates two hosts.

>From the Lustre manual, "13.12.  Specifying NIDs and Failover":
Where multiple NIDs are specified separated by commas (for example, 10.67.73.200 at tcp,192.168.10.1 at tcp), the two NIDs refer to the same host, and the Lustre software chooses the best one for communication. When a pair of NIDs is separated by a colon (for example, 10.67.73.200 at tcp:10.67.73.201 at tcp), the two NIDs refer to two different hosts and are treated as a failover pair (the Lustre software tries the first one, and if that fails, it tries the second one.)

Hope this helps,
Nathan

On Sat, Dec 4, 2021 at 5:27 AM Thomas Roth <t.roth at gsi.de<mailto:t.roth at gsi.de>> wrote:
Dear Ricardo,

perhaps the syntax of the --mgsnode specification?

Which Lustre version are you running? There might have been changes in the way mgsnodes are specified.

And the four NIDs you mentioned, are these all failover partners? Or DNS nodes?

Example from our site:
We have three MDS, each a pair of active server and failover partner.
The format command for the first (MGS+MDT0) read (under Lustre 2.10.6):
 > ... --servicenode=10.20.3.0 at o2ib5 --servicenode=10.20.3.1 at o2ib5 --mgsnode=10.20.3.0 at o2ib5 --mgsnode=10.20.3.1 at o2ib5 ...
No comma, no colon.
The format command for the second (MDT1) read:
 > ...  --servicenode=10.20.2.236 at o2ib5 --servicenode=10.20.2.237 at o2ib5 --mgsnode=10.20.3.0 at o2ib5 --mgsnode=10.20.3.1 at o2ib5 ...
Obviously the servicenodes are the IPs of MDT1 and its failover partner, the mgsnodes are again the IPs of MGS and its partner.


Regards,
Thomas

On 11/30/21 19:05, Ricardo Brugman wrote:
> Hi all,
>
> I’ve seen many questions/issues came by and I decided to post the issue that I encountered.
>
> Recently I tried updating the mgsnode IP address on a lustre node and although the command executed successfully, the old IP value remained.
>
> Old value: 10.10.10.2 (points to a server that is not a mgsnode)
> New value: 10.10.10.201 at o2ib,10.10.10.202 at o2ib,10.10.10.203 at o2ib,10.10.10.204 at o2ib
>
> Please find the command and output below:
>
> [root at xxx ~]# tunefs.lustre --erase-param mgsnode --writeconf --mgsnode=10.10.10.201 at o2ib,10.10.10.202 at o2ib,10.10.10.203 at o2ib,10.10.10.204 at o2ib zfs_R10_nvme0-4/dne_mdt1
> checking for existing Lustre data: found
>
>     Read previous values:
> Target:     neohpfs-MDT0001
> Index:      1
> Lustre FS:  neohpfs
> Mount type: zfs
> Flags:      0x1
>                (MDT )
> Persistent mount opts:
> Parameters: mgsnode=10.10.10.2 at o2ib
>
>     Permanent disk data:
> Target:     neohpfs=MDT0001
> Index:      1
> Lustre FS:  neohpfs
> Mount type: zfs
> Flags:      0x141
>                (MDT update writeconf )
> Persistent mount opts:
> Parameters:  mgsnode=:10.10.10.201 at o2ib,10.10.10.202 at o2ib,10.10.10.203 at o2ib,10.10.10.204 at o2ib
> [root at xxx ~]#
>
> I did restart the lustre service thinking this would perhaps load the new value/config and although the service came up successfully, it still had not loaded the new value.
>
> Appreciate any help, suggestions you can provide as to why the new value was not saved/loaded. In case I made a mistake, or I followed the incorrect step(s)/process than please, feel free to point that out.
>
> Best Regards,
> Ricardo
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211207/756422a5/attachment.html>


More information about the lustre-discuss mailing list