[lustre-discuss] Multihoming Lustre server

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Tue Oct 16 14:18:08 PDT 2018


> On Oct 16, 2018, at 7:04 AM, Mark Roper <markroper at gmail.com> wrote:
> 
> I have successfully set up a Lustre filesystem that is multi-homed on two different TCP NIDs, using the following configuration.
> Mount MGS & MDT
> 
>    sudo lnetctl lnet configure
>    sudo lnetctl net del --net tcp
>    sudo lnetctl net add --net tcp0 --if eth1
>    sudo lnetctl net add --net tcp1 --if eth0
>    sudo zpool create -O canmount=off -o ashift=12 mdtPool0 /dev/device1
>    sudo mkfs.lustre --mgs \
>       --mdt \
>       --servicenode 10.0.1.109 at tcp0,172.30.0.228 at tcp1 \
>       --backfstype=zfs --fsname=demo --index=0 mdtPool0/mdt0 /dev/device1
>    sudo sh -c 'echo "$(hostname) - demo:MDT0000 zfs:mdtPool0/mdt0" >> /etc/ldev.conf'
>    sudo service lustre start
> 
> Mount an OST
> 
>    sudo lnetctl lnet configure
>    sudo lnetctl net del --net tcp
>    sudo lnetctl net add --net tcp0 --if eth1
>    sudo lnetctl net add --net tcp1 --if eth0
>    sudo zpool create -O canmount=off -o ashift=12 ostPool0 /dev/device1
>    sudo mkfs.lustre --reformat --ost --backfstype=zfs --fsname=demo --index=0 \
>        --servicenode 10.0.6.156 at tcp0,172.30.0.250 at tcp1 --mgsnode=172.30.0.228 at tcp1 ostPool0/ost0 /dev/device1
>    sudo sh -c 'echo "$(hostname) - demo:OST0000 zfs:ostPool0/ost0" >> /etc/ldev.conf'
>    sudo service lustre start
> 
> I can mount and use this file system on either tcp0 or tcp1.  However, there are times when network traffic on my tcp1 network is blocked.  If the tcp1 LNET network is network blocked while running mkfs.lustre for an OST, the mount of the OST fails.

That is because you only specified one NID for the mgsnode option, and that NID uses tcp1.  If tcp1 is not available, the OSS doesn’t know how to contact the MDS to register the OST.  Have you tried using “—mgsnode 10.0.1.109 at tcp0,172.30.0.228 at tcp1” to see if that works.

> If I attempt to use mkfs.lustre to go back and update the mgs & mdt servernodes to support both LNET nids after mounting the OSTs, the command succeeds, but the file system is not mountable from the client.

You can’t use mkfs.lustre to update service node NIDs once the file system is formatted.  You would need to perform a writeconf or use the “lctl replace_ids” command.  (You can check the Lustre manual for the “Changing a Server NID” section.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu



More information about the lustre-discuss mailing list