[lustre-discuss] Assistance deploying Slurm HPC cluster with Lustre file system based on Google Cloud Platform (GCP)

Wyatt Gorman wyattgorman at gmail.com
Mon Aug 5 07:52:53 PDT 2019


Hi Eyal,

I'm Wyatt Gorman, I'm a HPC Specialist at Google and both wrote the Lustre
deployment manager scripts and help SchedMD with the Slurm scripts. I have
some good news for you, we've simplified using Lustre and other filesystems
in the upcoming version of the Slurm scripts, so that you won't need to
manually install Lustre or add scripts to do so.

Just FYI, if you did want to add some custom installation steps there are
custom installation scripts in the scripts folder for compute, controller
and login where you could add the commands you've listed below.

However, there's no need. If you check the "v3" branch of the SchedMD
slurm-gcp repo you'll find a new YAML field where you can specify network
storage mounts, including Lustre. If you specify a Lustre mount you can
then the Lustre client will automatically be installed, your mount point
created, and the filesystem will be mounted when the system comes online.
You will then need to modify permissions of the mount because the
filesystem is mounted as root, but you can modify this behavior if desired.

Let me know if you'd like to chat about your work, and talk through your
plans to identify any other areas you might save some effort.

And just FYI, in the future this question might be better suited for the
Google Cloud Slurm Discussion Group (
https://groups.google.com/forum/#!forum/google-cloud-slurm-discuss), where
we have folks regularly monitoring messages.

Thanks,
Wyatt Gorman

On Sun, Aug 4, 2019 at 4:25 PM <lustre-discuss-request at lists.lustre.org>
wrote:

> Date: Sun, 4 Aug 2019 13:18:20 +0000
> From: Eyal Estrin <eyale at hotmail.com>
> To: "lustre-discuss at lists.lustre.org"
>         <lustre-discuss at lists.lustre.org>
> Subject: [lustre-discuss] Assistance deploying Slurm HPC cluster with
>         Lustre file system based on Google Cloud Platform (GCP)
> Message-ID:
>         <
> BY5PR13MB31095BCAB4B60827FAB6DE60BBDB0 at BY5PR13MB3109.namprd13.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
> 1. I am trying to deploy Slurm HPC cluster based on Google Cloud Platform,
> with Lustre file system, as instructed below:?
>    https://codelabs.developers.google.com/codelabs/hpc-slurm-on-gcp/#0?
>
> https://cloud.google.com/blog/products/storage-data-transfer/introducing-lustre-file-system-cloud-deployment-manager-scripts
> ?
>
> https://github.com/GoogleCloudPlatform/deploymentmanager-samples/tree/master/community/lustre
> ?
> ?
> 2. I have created VPC Peering between the Slurm network and the Lustre
> cluster network?
> ?
> 3. I have created Firewall rules for allowing all ports and protocols
> between the Slurm network and the Lustre cluster network?
> ?
> 4. I have added DNS records for all the Lustre cluster machines inside the
> Slurm master node /etc/hosts?
> ?
> 5. I have installed the following Lustre client pre-requirements on the
> Slurm master node:?
>    sudo yum install kernel kernel-devel kernel-headers
> kernel-abi-whitelists kernel-tools kernel-tools-libs
> kernel-tools-libs-devel?
> ?
> 6. I have created the /etc/yum.repos.d/lustre.repo with the following
> content:?
> [lustre-server]?
> name=CentOS-$releasever - Lustre?
> baseurl=
> https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/server/?
> gpgcheck=0
> <https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/server/?gpgcheck=0>
> ?
> ?
> [e2fsprogs]?
> name=CentOS-$releasever - Ldiskfs?
> baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el7/?
> gpgcheck=0
> <https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el7/?gpgcheck=0>
> ?
> ?
> [lustre-client]?
> name=CentOS-$releasever - Lustre?
> baseurl=
> https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/client/?
> gpgcheck=0
> <https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/client/?gpgcheck=0>
> ?
> ?
> 7. I have installed the Lustre client packages on the Slurm master node,
> using the following command:?
>    sudo yum install e2fsprogs lustre-client?
> ?
> 8. I used the following commands to create a mount point for the Lustre
> file system from within the Slurm master node:?
>    sudo mkdir -p /lustre?
>    sudo chmod 777 -R /lustre?
> ?
> 9. Due to the fact that on the Slurm master node on Google Cloud Platform,
> my logged-in account is not Root account, but a Google G Suite account, the
> only way to perform mount and create a test file inside the mount point
> /lustre, is to use the following Sudo commands:?
>     sudo mount -t lustre lustre-mds1:/lustre /lustre?
>     sudo touch /lustre/1.txt?
> ?
> I have couple of problems with the above process:?
> A. Even though the mount point (/lustre) has chmod of 777, the folder is
> still owned by Root user and group, and I am still unable to write files
> into the /Lustre mount point? - How do I allow Google G Suite accounts the
> privilege to read/write/delete files from the /Lustre mount point?
>
> B. How do I add the following packages as part of the Slurm deployment
> package on both the Slurm master node and on all Slurm compute nodes (
> https://github.com/SchedMD/slurm-gcp)??
>    sudo yum install kernel kernel-devel kernel-headers
> kernel-abi-whitelists kernel-tools kernel-tools-libs
> kernel-tools-libs-devel?
>    sudo yum install e2fsprogs lustre-client?
>    Note: For the Lustre client installation, I need to add the
> /etc/yum.repos.d/lustre.repo with specific content (as instructed here:
> http://wiki.lustre.org/Installing_the_Lustre_Software)?
>
>
>
> Thanks,
>
> Eyal Estrin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190805/f7d04f5c/attachment-0001.html>


More information about the lustre-discuss mailing list