[lustre-discuss] Multihoming Lustre server

Michael Di Domenico mdidomenico4 at gmail.com
Tue Oct 16 05:19:35 PDT 2018


can you expand on this part "However, there are times when network
traffic on my tcp1 network is blocked.  If the tcp1 LNET network is
network blocked while running mkfs.lustre"?  I'm not an expert by any
stretch, but this sounds like a recipe for disaster
On Tue, Oct 16, 2018 at 7:05 AM Mark Roper <markroper at gmail.com> wrote:
>
> Lustre Community,
>
> I have successfully set up a Lustre filesystem that is multi-homed on two different TCP NIDs, using the following configuration.
>
> Mount MGS & MDT
>
>    sudo lnetctl lnet configure
>    sudo lnetctl net del --net tcp
>    sudo lnetctl net add --net tcp0 --if eth1
>    sudo lnetctl net add --net tcp1 --if eth0
>    sudo zpool create -O canmount=off -o ashift=12 mdtPool0 /dev/device1
>    sudo mkfs.lustre --mgs \
>       --mdt \
>       --servicenode 10.0.1.109 at tcp0,172.30.0.228 at tcp1 \
>       --backfstype=zfs --fsname=demo --index=0 mdtPool0/mdt0 /dev/device1
>    sudo sh -c 'echo "$(hostname) - demo:MDT0000 zfs:mdtPool0/mdt0" >> /etc/ldev.conf'
>    sudo service lustre start
>
> Mount an OST
>
>    sudo lnetctl lnet configure
>    sudo lnetctl net del --net tcp
>    sudo lnetctl net add --net tcp0 --if eth1
>    sudo lnetctl net add --net tcp1 --if eth0
>    sudo zpool create -O canmount=off -o ashift=12 ostPool0 /dev/device1
>    sudo mkfs.lustre --reformat --ost --backfstype=zfs --fsname=demo --index=0 \
>        --servicenode 10.0.6.156 at tcp0,172.30.0.250 at tcp1 --mgsnode=172.30.0.228 at tcp1 ostPool0/ost0 /dev/device1
>    sudo sh -c 'echo "$(hostname) - demo:OST0000 zfs:ostPool0/ost0" >> /etc/ldev.conf'
>    sudo service lustre start
>
> I can mount and use this file system on either tcp0 or tcp1.  However, there are times when network traffic on my tcp1 network is blocked.  If the tcp1 LNET network is network blocked while running mkfs.lustre for an OST, the mount of the OST fails.  Running journalctl -xe yields:
>
> kernel: LustreError: 15f-b: demo-OST0000: cannot register this server with the MGS: rc = -110. Is the MGS running?
> kernel: LustreError: 5798:0:(obd_mount_server.c:1936:server_fill_super()) Unable to start targets: -110
> kernel: LustreError: 5798:0:(obd_mount_server.c:1586:server_put_super()) no obd demo-OST0000
> kernel: LustreError: 5798:0:(obd_mount_server.c:132:server_deregister_mount()) demo-OST0000 not registered
> kernel: Lustre: server umount demo-OST0000 complete
> kernel: LustreError: 5798:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-110)
>
>
> If I exclude the tcp1 servicenode when I mount the MDT, I am able to mount the OSTs on both tcp0 and tcp1.  If I attempt to use mkfs.lustre to go back and update the mgs & mdt servernodes to support both LNET nids after mounting the OSTs, the command succeeds, but the file system is not mountable from the client.
>
> Is there a way to reliably stand up a filesystem in this configuration such that the mkfs.lustre command succeed, and that the tcp1 lnet network will be functional once the network traffic is no longer blocked?  Or is it required that all LNET networks be functional at the time that the server components mkfs.lustre commands are run?
>
> Many thanks!
>
> Mark Roper
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list