[lustre-discuss] Coordinating cluster start and shutdown?

Tung-Han Hsieh tunghan.hsieh at gmail.com
Sun Dec 10 00:29:38 PST 2023


Dear All,

I can contribute a few simple scripts to coordinate the start / stop of the
whole Lustre file system. Everyone is welcome to use it or modify it to fit
the usage of your system. Sorry that I did not prepare a completed document
for these scripts. Here I only mention some relevant usages briefly. If you
are interested in more details, I will be happy to answer here.

- server:/opt/lustre/etc/cfs-chome:
   The configuration file, where the Lustre file system is named "chome".
The head node is named "server", which is also one of the Lustre clients.
This file lists all the MGS, MDS, OSS, and lustre clients. If MGS and MDS
have both ethernet and infiniband networks, you can specify their IP
explicitly. If MDT or OST were formatted by ZFS, you can list them as well.

- server:/opt/lustre/etc/cfsd:
   The main script to coordinate the start / stop / shutdown (emergent
shutdown) of the Lustre system, running in the head node. The usage is:
   # cd /opt/lustre/etc/
   # ./cfsd start chrome
   # ./cfsd stop chrome
   # ./cfsd shutdown

   When doing "start", it will do the following procedures (the script will
ssh into each file servers and clients to do the mount):
   1. If some of the MDT/OST were based on ZFS, it starts ZFS of these
MDT/OST first.
   2. Mount MGT, MDT, and OST in order.
   3. Mount all the clients.

   When doing "stop", it will reverse the above procedures to do unmount.

   When doing "shutdown", usually used when the air-conditioner of the
computing room is broken, and the whole room is in a emergent state that we
need to shutdown the whole system as fast as possible:
   1. Shutdown all the clients (for the head node, only unmount Lustre
without shutdown) right now.
   2. Unmount all the OST, MDT, MGT, and then shutdown these servers.
   3. Shutdown the head node.

- client:/etc/init.d/lustre_mnt:
   Sometimes the clients have to be rebooted, and we want it to mount
Lustre automatically, or unmount Lustre correctly during shutdown. This
script do this work. It reads /opt/lustre/etc/cfs-chome to check whether
all the file servers are alive, determine whether it should mount Lustre
through ethernet or infiniband, and do the mount. When doing unmount, after
unmount it also unload all the Lustre kernel modules. The usage is:
   # /etc/init.d/lustre_mnt start
   # /etc/init.d/lustre_mnt stop

- client:/etc/systemd/system/sysinit.target.wants/lustre_mnt.service:
   If the client has infiniband network, it is very annoying that it will
stop OpenIB quite quickly before shutdown the Lustre mounts, and then hang
the system without power-off. Hence, this file is to tell systemd to wait
for /etc/init.d/lustre_mnt stop and then proceed the shutdown of OpenIB.

Please note that these scripts may have bugs when used in variety
environments. And also note that these scripts does not implement the case
of Lustre HA (because we don't have). If you have any suggestions, I will
be very appreciated. I am also very happy if you could find them useful.

Cheers,

T.H.Hsieh

Bertschinger, Thomas Andrew Hjorth via lustre-discuss <
lustre-discuss at lists.lustre.org> 於 2023年12月7日 週四 上午12:01寫道:

> Hello Jan,
>
> You can use the Pacemaker / Corosync high-availability software stack for
> this: specifically, ordering constraints [1] can be used.
>
> Unfortunately, Pacemaker is probably over-the-top if you don't need HA --
> its configuration is complex and difficult to get right, and it
> significantly complicates system administration. One downside of Pacemaker
> is that it is not easy to decouple the Pacemaker service from the Lustre
> services, meaning if you stop the Pacemaker service, it will try to stop
> all of the Lustre services. This might make it inappropriate for use cases
> that don't involve HA.
>
> Given those downsides, if others in the community have suggestions on
> simpler means to accomplish this, I'd love to see other tools that can be
> used here (especially officially supported ones, if they exist).
>
> [1]
> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/constraints.html#specifying-the-order-in-which-resources-should-start-stop
>
> - Thomas Bertschinger
>
> ________________________________________
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf
> of Jan Andersen <jan at comind.io>
> Sent: Wednesday, December 6, 2023 3:27 AM
> To: lustre
> Subject: [EXTERNAL] [lustre-discuss] Coordinating cluster start and
> shutdown?
>
> Are there any tools for coordinating the start and shutdown of lustre
> filesystem, so that the OSS systems don't attempt to mount disks before the
> MGT and MDT are online?
> _______________________________________________
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231210/a25ef9a1/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cfs-chome
Type: application/octet-stream
Size: 1119 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231210/a25ef9a1/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cfsd
Type: application/octet-stream
Size: 12911 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231210/a25ef9a1/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_mnt
Type: application/octet-stream
Size: 3475 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231210/a25ef9a1/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_mnt.service
Type: application/octet-stream
Size: 296 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231210/a25ef9a1/attachment-0007.obj>


More information about the lustre-discuss mailing list