[lustre-discuss] Is it that simple to add a pair of new OSTs ?

Audet, Martin Martin.Audet at cnrc-nrc.gc.ca
Mon Apr 7 05:46:37 PDT 2025


Hello Andreas,

Thanks for your response.

Yes our hardware is a bit old. It was acquired six years ago, in April 2016. But we use a quite recent versions of Lustre, RHEL kernel, OS and MOFED. Here are some details:

Presently:
Lustre: 2.15.4 compiled from git repository
MOFED: 23.10-2.1.3.1
File server node: RHEL 8.9 kernel 4.18.0-513.24.1 patched for Lustre, , all other non-kernel RPMs being updated on a weekly basis with latest RHEL 8.10
Head/Compute nodes: RHEL 9.3 kernel 5.14.0-362.24.1, , all other non-kernel RPMs being updated on a weekly basis with latest RHEL 9.5

After a planned update next week:
Lustre: git commit a71369eb9cb0aa89ede41cb01b2cd9cdcd8e9680 (2.15.6 + 3 patches: LU-18085 llite: use RCU to protect the dentry_data) compiled from git repository
MOFED: 24.10-2.1.8.0
File server node: RHEL 8.10 kernel 4.18.0-553.27.1 patched for Lustre, all other non-kernel RPMs being updated on a weekly basis with latest RHEL 8.10
Head/Compute nodes: RHEL 9.5 kernel 5.14.0-503.14.1, all other non-kernel RPMs being updated on a weekly basis with latest RHEL 9.5+

When we will add the two new OSTs, in two weeks maybe, we plan to have our compute and head nodes powered OFF of other reasons. So the race condition is absolutely not a potential problem in our case. But thanks for explaining this potential race problem.

Now I have another question: it seems that the OSS contact the MGS server to announce their OST and the MGS simply accept them. I am a bit surprised to see that nothing needs to be done MGS side to restrict which OSS server can offer OSTs. I guess it is like that to keep the basic scenario simple. But if we want to improve security, is there some mechanism to restrict which server can provide an OST ? In our case it is very simple since MGS, MDS and OSTs are all running on the same server.

Thanks,

Martin

From: Andreas Dilger <adilger at ddn.com>
Sent: April 5, 2025 3:00
To: Audet, Martin <Martin.Audet at cnrc-nrc.gc.ca>
Cc: lustre-discuss at lists.lustre.org; Raymond, Stephane <Stephane.Raymond at cnrc-nrc.gc.ca>
Subject: EXT: Re: [lustre-discuss] Is it that simple to add a pair of new OSTs ?

***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.

It really is that simple.

You didn't mention what version you are using, but based on the hardware and sizes I would assume it is not the latest.

As such, there is a race that if clients are actively creating and writing new files at the instant the OSTs are added, those files may be inaccessible on some clients for a few seconds until the new OSTs are visible on all clients.

 If the clients accessing the filesystem are quiesced during the initial mount then there is no race. Very recent servers and clients have fixed this race.

Cheers, Andreas


On Apr 3, 2025, at 15:12, Audet, Martin via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

Hello Lustre community,

We are operating a small HPC cluster (576 compute cores) using a small Lustre parallel filesystem (64 TB) connected by Infiniband EDR network. The Lustre filesystem is implemented by a single HPE DL380 Gen10 server acting as MGS, MDS and OSS. It has two 32 TB OSTs (HPE MSA 2050). As new space is required, we will soon install 160 TB of additional storage implemented two 80 TB OSTs (HPE MSA 2060).

We looked in the Lustre documentation (10.2.1.  Scaling the Lustre File System: https://doc.lustre.org/lustre_manual.xhtml#idm140220261007664) and made tests with small VMs. It appear that in our case adding this new storage would be very simple. From what we understand we should do something like this:

# Create mount points for the new OSTs
mkdir /mnt/ost{2,3}

# The MGS is running on the same node as the OSTs
mgs_node="$(sed -n -e 's/^ *- *nid: *//; T; p' < /etc/lnet.conf)"

# Set the devices corresponding to the new OSTs using invariant names
ost2_device=/dev/disk/by-path/...
ost3_device=/dev/disk/by-path/...

# Create the file systems on the new OSTs
mkfs.lustre --fsname=lustrevm --mgsnode=$mgs_node --ost --index=2 $ost2_device
mkfs.lustre --fsname=lustrevm --mgsnode=$mgs_node --ost --index=3 $ost3_device

# Update fstab
cat >> /etc/fstab << _EOF_
$ost2_device /mnt/ost2 lustre defaults,_netdev 0 0
$ost3_device /mnt/ost3 lustre defaults,_netdev 0 0
_EOF_

# Mount the new OSTs
mount /mnt/ost2
mount /mnt/ost3


This appears too simple. Are we missing something ? Will the new files created by the clients use the four OSTs with no additional effort ?

Thanks in advance !

Martin Audet
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250407/6103a4d5/attachment-0001.htm>


More information about the lustre-discuss mailing list