<div dir="ltr"><p>Hi Martin,<br></p><p>I'm resending this message because I wasn't subscribed to the list, and it's important to share this feedback with the community.</p><p></p>
<p>I'm working with Carlos on this Lustre upgrade, and I've been directly involved in the installation and troubleshooting process.</p><p>The issue turned out to be the absence of the <code>kernel-modules</code>
package that corresponds to the modified Lustre kernel. It appears this
package is required for the disk controller and InfiniBand drivers to
function properly.</p>
<p>After installing the package (available from the Whamcloud
repository), we no longer saw any warnings during installation or
encountered any filesystem issues at boot.<br><br><span class="gmail-HwtZe" lang="en"><span class="gmail-jCAhz gmail-ChMk0b"><span class="gmail-ryNqvb">Even though the </span></span></span><code>kernel-modules</code><span class="gmail-HwtZe" lang="en"><span class="gmail-jCAhz gmail-ChMk0b"><span class="gmail-ryNqvb"> package appears to be considered optional, it ended up being essential for us and I believe it would be essential in most cases where someone installs Lustre through the pre-compiled packages.</span></span></span><br><br>We're currently
running performance tests to determine whether we'll need to build and
install the MOFED drivers to get the most out of the InfiniBand network.</p><p>Thanks a lot for your input and interest in our issue — it was really helpful!</p>
<p>Best regards,</p><p>Eloir Troyack</p></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">Em seg., 28 de abr. de 2025 às 18:43, Audet, Martin <<a href="mailto:Martin.Audet@cnrc-nrc.gc.ca">Martin.Audet@cnrc-nrc.gc.ca</a>> escreveu:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-5917798159638654708">
<div dir="ltr">
<div id="m_-5917798159638654708divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir="ltr">
<p>Hello Carlos,</p>
<p><br>
</p>
<p>Your hardware is interesting. It is more powerful than ours.</p>
<p><br>
</p>
<p>Last year in May we performed an upgrade from CentOS 7.10 with Lustre 2.12.4 to Lustre 2.15.4 with RHEL 8.9 (for the Lustre file server) and RHEL 9.3 (for the head and compute nodes).</p>
<p><br>
</p>
<p>It was a big update. We were very nervous.</p>
<p><br>
</p>
<p>We did spend a lot of time to prepare this general update (almost everything including firmware was updated) as we had no auxiliary system to "practice" (excepts some VMs). We spend a lot of time to script the installation process completely, from the installation
.iso with kickstart to the node in its final state (in 3 flavors: file server, head node or compute node) across multiple reboot steps (in 30 min) and possibly in parallel in addition to using compiled Lustre and MOFED RPMs at every step and developing a repository
system where the custom Lustre or MOFED RPMs can hide the corresponding RPMs of the distribution while allowing updates for non kernel, Lustre or MOFED RPMs on a weekly basis. All of this worth the effort. The update this year using these improved mechanisms
was way faster and smoother. I believe that compiling Lustre and choosing which git commit to use also worth the additional effort as it improve compatibility.</p>
<p><br>
</p>
<p>I am interested in your problem. When you find the solution, please publish as it can help the community.</p>
<p><br>
</p>
<p>Thanks,</p>
<p><br>
</p>
<p>Martin Audet</p>
<div style="color:rgb(0,0,0)">
<hr style="display:inline-block;width:98%">
<div id="m_-5917798159638654708divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Carlos Adean <<a href="mailto:carlosadean@linea.org.br" target="_blank">carlosadean@linea.org.br</a>><br>
<b>Sent:</b> April 28, 2025 4:15 PM<br>
<b>To:</b> Audet, Martin<br>
<b>Cc:</b> <a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a>; Eloir Troyack<br>
<b>Subject:</b> EXT: Re: Re: [lustre-discuss] Installing lustre 2.15.6 server on rhel-8.10 fails</font>
<div> </div>
</div>
<div>
<div><span style="font-weight:bold">***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.</span></div>
<div><br>
</div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hi Martin,</div>
<div><br>
</div>
<div>I really appreciate the help. <br>
</div>
</div>
<div><br>
</div>
<div>My answers are inline below.</div>
<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div id="m_-5917798159638654708m_8110646284294991417m_-585840447576696450m_7890049679768417789m_-4994286033720769781divtagdefaultwrapper" dir="ltr" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif">
<p>One question: are you using the precompiled Lustre RPMs (e.g. those available from: <a href="https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/" id="m_-5917798159638654708LPlnk349629" target="_blank">https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/</a> )
or are you compiling your own RPMs from the Lustre git repository ( <a href="https://github.com/lustre/lustre-release" id="m_-5917798159638654708LPlnk133108" target="_blank">https://github.com/lustre/lustre-release</a> ) ?</p>
<p></p>
<p>In our case we use the second approach and I think it is better for two reasons:<br>
<br>
</p>
<blockquote style="margin:0px 0px 0px 40px;border:medium;padding:0px">
<p>1- You make sure that everything is consistent, especially with your MOFED environment</p>
<p>2- You are not forced to use the specific versions corresponding to tags exactly, you can chose any version available in git repository or cherry-pick the fixes you think are useful (more details on this later).</p>
</blockquote>
</div>
</div>
</div>
</blockquote>
<div><i><br>
</i></div>
<div>
<div><font size="2">Precompiled RPMs. <br>
</font></div>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div id="m_-5917798159638654708m_8110646284294991417m_-585840447576696450m_7890049679768417789m_-4994286033720769781divtagdefaultwrapper" dir="ltr" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif">
<p>In our case we upgraded last week a small HPC cluster using RHEL 8 for the file server and RHEL 9 for the clients. The update was successful and we had no problem related to MOFED, Lustre, PMIx, Slurm, MPI (including MPI-IO) up to now.</p>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Your upgrade scenario is similar to ours. We’re upgrading our servers from RHEL 7 with Lustre 2.12.6 to RHEL 8.10 with Lustre 2.15.x. The clients previously ran RHEL 7 and will now run RHEL 9.5.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div id="m_-5917798159638654708m_8110646284294991417m_-585840447576696450m_7890049679768417789m_-4994286033720769781divtagdefaultwrapper" dir="ltr" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif">
<p>Our upgrade is described in a message posted on this mailing list on April 7th:</p>
<blockquote style="margin:0px 0px 0px 40px;border:medium;padding:0px">
<p><br>
</p>
<p><a href="http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html" target="_blank">http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html</a></p>
</blockquote>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>
<div>Actually, our Lustre environment is a bit complex. It has approximately 570 TB of capacity, organized into two tiers: T0 (70 TB) and T1 (500 TB).</div>
<div><br>
</div>
<div>Its infrastructure is composed of two MDS servers connected to a Dell ME4024 storage array, and four OSS servers. Two of these OSS nodes are equipped with NVMe SSDs and provide the T0 tier (high-performance scratch space), while the other two OSS nodes
are connected via SAS to two ME4084 storage arrays, supporting the T1 tier (long-term data). The entire system operates with high availability (HA) and load balancing (LB) mechanisms.</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Cheers,</div>
<div><br>
<div>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">---<br>
<div><i><b>Carlos Adean</b></i></div>
<div><a href="https://www.linea.org.br" target="_blank">www.linea.org.br</a></div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></blockquote></div><div><br clear="all"></div><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><span style="color:rgb(0,0,0);font-family:Verdana,Arial,"Bitstream Vera Sans",Helvetica,sans-serif;font-size:13px">Eloir G. S. Troyack<br>Service Desk - LIneA<br></span><a rel="noopener noreferrer" href="https://www.linea.org.br" title="https://www.linea.org.br" target="_blank">www.linea.org.br</a><span style="color:rgb(0,0,0);font-family:Verdana,Arial,"Bitstream Vera Sans",Helvetica,sans-serif;font-size:13px"></span></div></div>