[lustre-discuss] Assistance deploying Slurm HPC cluster with Lustre file system based on Google Cloud Platform (GCP)
Eyal Estrin
eyale at hotmail.com
Sun Aug 4 06:18:20 PDT 2019
Hi all,
1. I am trying to deploy Slurm HPC cluster based on Google Cloud Platform, with Lustre file system, as instructed below:
https://codelabs.developers.google.com/codelabs/hpc-slurm-on-gcp/#0
https://cloud.google.com/blog/products/storage-data-transfer/introducing-lustre-file-system-cloud-deployment-manager-scripts
https://github.com/GoogleCloudPlatform/deploymentmanager-samples/tree/master/community/lustre
2. I have created VPC Peering between the Slurm network and the Lustre cluster network
3. I have created Firewall rules for allowing all ports and protocols between the Slurm network and the Lustre cluster network
4. I have added DNS records for all the Lustre cluster machines inside the Slurm master node /etc/hosts
5. I have installed the following Lustre client pre-requirements on the Slurm master node:
sudo yum install kernel kernel-devel kernel-headers kernel-abi-whitelists kernel-tools kernel-tools-libs kernel-tools-libs-devel
6. I have created the /etc/yum.repos.d/lustre.repo with the following content:
[lustre-server]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/server/
gpgcheck=0
[e2fsprogs]
name=CentOS-$releasever - Ldiskfs
baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el7/
gpgcheck=0
[lustre-client]
name=CentOS-$releasever - Lustre
baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-feature-release/el7/client/
gpgcheck=0
7. I have installed the Lustre client packages on the Slurm master node, using the following command:
sudo yum install e2fsprogs lustre-client
8. I used the following commands to create a mount point for the Lustre file system from within the Slurm master node:
sudo mkdir -p /lustre
sudo chmod 777 -R /lustre
9. Due to the fact that on the Slurm master node on Google Cloud Platform, my logged-in account is not Root account, but a Google G Suite account, the only way to perform mount and create a test file inside the mount point /lustre, is to use the following Sudo commands:
sudo mount -t lustre lustre-mds1:/lustre /lustre
sudo touch /lustre/1.txt
I have couple of problems with the above process:
A. Even though the mount point (/lustre) has chmod of 777, the folder is still owned by Root user and group, and I am still unable to write files into the /Lustre mount point - How do I allow Google G Suite accounts the privilege to read/write/delete files from the /Lustre mount point?
B. How do I add the following packages as part of the Slurm deployment package on both the Slurm master node and on all Slurm compute nodes (https://github.com/SchedMD/slurm-gcp)?
sudo yum install kernel kernel-devel kernel-headers kernel-abi-whitelists kernel-tools kernel-tools-libs kernel-tools-libs-devel
sudo yum install e2fsprogs lustre-client
Note: For the Lustre client installation, I need to add the /etc/yum.repos.d/lustre.repo with specific content (as instructed here: http://wiki.lustre.org/Installing_the_Lustre_Software)
Thanks,
Eyal Estrin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190804/93e80777/attachment.html>
More information about the lustre-discuss
mailing list