[lustre-discuss] Multihoming Lustre server

Mark Roper markroper at gmail.com
Tue Oct 16 04:04:40 PDT 2018


Lustre Community,

I have successfully set up a Lustre filesystem that is multi-homed on two
different TCP NIDs, using the following configuration.

*Mount MGS & MDT*

   sudo lnetctl lnet configure
   sudo lnetctl net del --net tcp
   sudo lnetctl net add --net tcp0 --if eth1
   sudo lnetctl net add --net tcp1 --if eth0
   sudo zpool create -O canmount=off -o ashift=12 mdtPool0 /dev/device1
   sudo mkfs.lustre --mgs \
      --mdt \
      --servicenode 10.0.1.109 at tcp0,172.30.0.228 at tcp1 \
      --backfstype=zfs --fsname=demo --index=0 mdtPool0/mdt0 /dev/device1
   sudo sh -c 'echo "$(hostname) - demo:MDT0000 zfs:mdtPool0/mdt0" >>
/etc/ldev.conf'
   sudo service lustre start

*Mount an OST*

   sudo lnetctl lnet configure
   sudo lnetctl net del --net tcp
   sudo lnetctl net add --net tcp0 --if eth1
   sudo lnetctl net add --net tcp1 --if eth0
   sudo zpool create -O canmount=off -o ashift=12 ostPool0 /dev/device1
   sudo mkfs.lustre --reformat --ost --backfstype=zfs --fsname=demo --index=0 \
       --servicenode 10.0.6.156 at tcp0,172.30.0.250 at tcp1
--mgsnode=172.30.0.228 at tcp1 ostPool0/ost0 /dev/device1
   sudo sh -c 'echo "$(hostname) - demo:OST0000 zfs:ostPool0/ost0" >>
/etc/ldev.conf'
   sudo service lustre start

I can mount and use this file system on either tcp0 or tcp1.  However,
there are times when network traffic on my tcp1 network is blocked.  If the
tcp1 LNET network is network blocked while running mkfs.lustre for an OST,
the mount of the OST fails.  Running journalctl -xe yields:

kernel: LustreError: 15f-b: demo-OST0000: cannot register this server
with the MGS: rc = -110. Is the MGS running?
kernel: LustreError:
5798:0:(obd_mount_server.c:1936:server_fill_super()) Unable to start
targets: -110
kernel: LustreError:
5798:0:(obd_mount_server.c:1586:server_put_super()) no obd
demo-OST0000
kernel: LustreError:
5798:0:(obd_mount_server.c:132:server_deregister_mount()) demo-OST0000
not registered
kernel: Lustre: server umount demo-OST0000 complete
kernel: LustreError: 5798:0:(obd_mount.c:1599:lustre_fill_super())
Unable to mount  (-110)


If I exclude the tcp1 servicenode when I mount the MDT, I am able to mount
the OSTs on both tcp0 and tcp1.  If I attempt to use mkfs.lustre to go back
and update the mgs & mdt servernodes to support both LNET nids after
mounting the OSTs, the command succeeds, but the file system is not
mountable from the client.

Is there a way to reliably stand up a filesystem in this configuration such
that the mkfs.lustre command succeed, and that the tcp1 lnet network will
be functional once the network traffic is no longer blocked?  Or is it
required that all LNET networks be functional at the time that the server
components mkfs.lustre commands are run?

Many thanks!

Mark Roper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181016/2ef687ef/attachment-0001.html>


More information about the lustre-discuss mailing list