<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Hello Carlos,</p>
<p><br>
</p>
<p>I'm sory that it didn't work.</p>
<p><br>
</p>
<p>One question: are you using the precompiled Lustre RPMs (e.g. those available from: <a href="https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/" class="OWAAutoLink">https://downloads.whamcloud.com/public/lustre/lustre-2.15.6/</a> ) or are you compiling
your own RPMs from the Lustre git repository ( <a href="https://github.com/lustre/lustre-release" class="OWAAutoLink">https://github.com/lustre/lustre-release</a> ) ?</p>
<p><br>
</p>
<p>In our case we use the second approach and I think it is better for two reasons:<br>
<br>
</p>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<p>1- You make sure that everything is consistent, especially with your MOFED environment</p>
<p>2- You are not forced to use the specific versions corresponding to tags exactly, you can chose any version available in git repository or cherry-pick the fixes you think are useful (more details on this later).</p>
</blockquote>
<p><br>
</p>
<p>In our case we upgraded last week a small HPC cluster using RHEL 8 for the file server and RHEL 9 for the clients. The update was successful and we had no problem related to MOFED, Lustre, PMIx, Slurm, MPI (including MPI-IO) up to now.</p>
<p><br>
</p>
<p>Our upgrade is described in a message posted on this mailing list on April 7th:</p>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<p><br>
</p>
<p><a href="http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html" class="OWAAutoLink">http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019471.html</a></p>
</blockquote>
<p><br>
</p>
<p>As you see we plan also to add additional storage (OSTs) soon by connecting an new MSA 2060 to our file server (this file server play the role of MGS, MDS and OSS). <span style="font-size: 12pt;">And as you see also we didn't compiled Lustre 2.15.6 exactly.
We compiled a commit on the 2.15 branch containing 2.15.6 plus tree additional patches, including <span>LU-18085. Many users, using 2.15.6 without this patch (<span style="font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;">LU-18085</span>)
complained on lustre-discuss and unfortunately it was added to the 2.15 branch only a few days after 2.15.6 was released. Look at this thread for example on lustre-discuss mailing list:</span></span></p>
<p><span style="font-size: 12pt;"><span><br>
</span></span></p>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<p><span style="font-size: 12pt;"><span><a href="http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019474.html" class="OWAAutoLink">http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2025-April/019474.html</a><br>
</span></span></p>
</blockquote>
<p><br>
</p>
<p>I will now explain you an outline of our procedure to get Lustre on our RHEL 8.10 server. It may be overkill but I think it takes all the precautions and it worked in our case:</p>
<p><br>
</p>
<p></p>
<ol style="margin-bottom: 0px; margin-top: 0px;">
<li>Install RHEL 8.10 on the system using the base kernel you want to patch (<span>4.18.0-553.27.1 in our case). Don't forget kernel-headers (for compiling MOFED) and have kernel source RPM available (to compile the patched kernel)</span></li><li><span>Compile the MOFED RPMs corresponding to the MOFED version you chose (ex: <span>24.10-2.1.8.0-LTS) using the <span style="font-family: Calibri, Helvetica, sans-serif; font-size: 16px;">mlnx_add_kernel_support.sh script with --kmp option</span></span></span></li><li><font face="Calibri, Helvetica, sans-serif">Install the MOFED RPMs (they will uninstall OFED from the Linux distro) (in our case we install:<span> mlnx-ofed-all knem mlnxofed-docs libxpmem-devel</span>)</font></li><li><font face="Calibri, Helvetica, sans-serif">Reboot (to activate the new MOFED)</font></li><li><font face="Calibri, Helvetica, sans-serif">Test MOFED</font></li><li><font face="Calibri, Helvetica, sans-serif">Compile the RPMs corresponding to the patched Lustre kernel (you will need the kernel source)</font></li><li><font face="Calibri, Helvetica, sans-serif">Put the resulting RPMs on a web server and setup an RPM repository (createrepo_c) so that they can be used during the next system installation</font></li><li><font face="Calibri, Helvetica, sans-serif">Re-install the system by making sure that your kickstart file refer to the repository containing the Lustre patched kernel RPMs (they must hide the corresponding distro RPMs) and reboot</font></li><li>repeat step 2, to compile a new MOFED since the patched kernel is different</li><li>repeat step 3 and 4, <span style="font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;">your system will now have a MOFED that correspond
exactly to your kernel patched for Lustre and not to the base kernel (because it is not even installed on the system)</span></li><li><span style="font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;">repeat step 5 to test the MOFED on the new kernel</span></li><li><span style="font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;">Compile the server specific RPMs related to Lustre</span></li><li><span style="font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;">Install those server RPMs (in our case: <span>kmod-lustre kmod-lustre-osd-ldiskfs
lustre{,-devel} lustre-iokit lustre-osd-ldiskfs-mount)</span></span></li><li><span style="font-family: Calibri, Helvetica, sans-serif, EmojiFont, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 16px;"><span>Configure Lustre (ex: /etc/lnet.conf, /etc/fstab, enable
lnet.service)</span></span></li><li>Reboot</li><li>With little luck the Lustre server should be operational</li></ol>
<div><br>
</div>
I hope this helps, good luck !
<div><br>
</div>
<div>Martin Audet<br>
<br>
<div style="color: rgb(0, 0, 0);">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Carlos Adean <carlosadean@linea.org.br><br>
<b>Sent:</b> April 23, 2025 9:06 PM<br>
<b>To:</b> Audet, Martin; lustre-discuss@lists.lustre.org<br>
<b>Cc:</b> Eloir Troyack<br>
<b>Subject:</b> EXT: Re: [lustre-discuss] Installing lustre 2.15.6 server on rhel-8.10 fails</font>
<div> </div>
</div>
<div>
<div><span style="font-weight:bold">***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.</span></div>
<div><br>
</div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hello Martin,</div>
<div><br>
</div>
<div>Thank you for the hint.<br>
<br>
I tried rebuilding using the suggested parameter, but the warnings persist.<br>
</div>
<div><br>
</div>
<div>Additionally, the system still fails to boot using the lustre kernel. <br>
</div>
<div><br>
</div>
<div>We noticed that Lustre's kernel image does not have the megaraid_sas module, which is used by the system to enable the Dell PERC H330 controller. This may be the cause of the boot failure.</div>
<div><br>
</div>
<div><span style="color:rgb(29,28,29); font-family:Monaco,Menlo,Consolas,"Courier New",monospace; font-size:12px; font-style:normal; font-weight:400; letter-spacing:normal; text-align:left; text-indent:0px; text-transform:none; word-spacing:0px; white-space:pre-wrap; display:inline; float:none">[root@mds2
~]# lsinitrd /boot/initramfs-4.18.0-553.27.1.el8_lustre.x86_64.img | grep megaraid_sas [root@mds2 ~]#
</span><br>
</div>
<div><br>
</div>
<div>However, this is not true for the kernel image installed via dnf.</div>
<div><br>
</div>
<div><span style="color:rgb(29,28,29); font-family:Monaco,Menlo,Consolas,"Courier New",monospace; font-size:12px; font-style:normal; font-weight:400; letter-spacing:normal; text-align:left; text-indent:0px; text-transform:none; word-spacing:0px; white-space:pre-wrap; display:inline; float:none">[root@mds2
~]# lsinitrd /boot/initramfs-4.18.0-553.27.1.el8_10.x86_64.img | grep megaraid_sas -rw-r--r-- 1 root root 72560 Jan 15 2024 usr/lib/modules/4.18.0-553.27.1.el8_10.x86_64/kernel/drivers/scsi/megaraid/megaraid_sas.ko.xz [root@mds2 ~]#
</span><br>
</div>
<div><br>
</div>
<div>I'm still here struggling to install it.</div>
<div><br>
</div>
<div><br clear="all">
</div>
<div>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div></div>
---<br>
<div><i><b>Carlos Adean</b></i></div>
<div><a href="https://www.linea.org.br" target="_blank">www.linea.org.br</a></div>
</div>
</div>
</div>
<br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Em qua., 23 de abr. de 2025 às 09:22, Audet, Martin <<a href="mailto:Martin.Audet@cnrc-nrc.gc.ca" target="_blank">Martin.Audet@cnrc-nrc.gc.ca</a>> escreveu:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div>
<div dir="ltr">
<div id="m_8997662621852205494m_8633720231178926790divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif">
<p>Hello,</p>
<p><br>
</p>
<p>I think I had a similar problem a long time ago and it was solved by adding the "--kmp" option to "<span>mlnx_add_kernel_support.sh" script when compiling MOFED RPMs. Without this option, the MOFED RPM compilation complete without problems, the same thing
when compiling Lustre RPMs but later, when installing Lustre RPMs, we get a bunch of problems related to symbols.</span></p>
<p><span><br>
</span></p>
<p><span>Here is how I compile the MOFED RPMs (uning the root account):</span></p>
<p><span><br>
</span></p>
<blockquote style="margin:0px 0px 0px 40px; border:medium; padding:0px">
<p><span><span style="font-family:Consolas,Courier,monospace"># mount_dir is the temporary mount directory</span></span></p>
<p><span><span style="font-family:Consolas,Courier,monospace"># ofed_iso is the MOFED .iso file</span></span></p>
<p><span><span style="font-family:Consolas,Courier,monospace">#<br>
mkdir -p -- $mount_dir</span><br>
</span></p>
<p><span><span><span style="font-family:Consolas,Courier,monospace">mount -o ro,loop $ofed_iso $mount_dir</span><br>
</span></span></p>
<p><span><span><span><span style="font-family:Consolas,Courier,monospace">$mount_dir/mlnx_add_kernel_support.sh -y --make-tgz --kmp -k $(uname -r) -m $mount_dir</span><br>
</span></span></span></p>
<p><span><span><span><span style="font-family:Consolas,Courier,monospace">#</span></span></span></span></p>
<p><span><span><span><span style="font-family:Consolas,Courier,monospace"># The compiled RPMs are now under /tmp</span></span></span></span></p>
<p><span><span><span><span style="font-family:Consolas,Courier,monospace"># ex: /tmp/<span>MLNX_OFED_LINUX-<span>24.10-2.1.8.0-rhel8.10.x86_64</span></span>-ext.tgz</span></span></span></span></p>
<p><span><span><br>
</span></span></p>
</blockquote>
<p><span style="font-size:12pt">It seems that the pre-compiled RPMs distributed by Mellanox/NVIDIA are always generated using the --kmp but when using mlnx_add_kernel_support.sh, this option must be explicitly specified. In addition, it seems that with the
newer DOCA OFED, the <span style="font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px">using script equivatent to mlnx_add_kernel_support.sh always
add --kmp option on RHEL and similar distributions.</span></span></p>
<p><span style="font-size:12pt"><span style="font-family:Calibri,Helvetica,sans-serif,EmojiFont,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,"Segoe UI Symbol","Android Emoji",EmojiSymbols; font-size:16px"><br>
</span></span></p>
<p>I hope it helps,</p>
<p><br>
</p>
<p>Martin</p>
<div style="color:rgb(0,0,0)">
<hr style="display:inline-block; width:98%">
<div id="m_8997662621852205494m_8633720231178926790divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> lustre-discuss <<a href="mailto:lustre-discuss-bounces@lists.lustre.org" target="_blank">lustre-discuss-bounces@lists.lustre.org</a>>
on behalf of Carlos Adean via lustre-discuss <<a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a>><br>
<b>Sent:</b> April 22, 2025 11:09 PM<br>
<b>To:</b> <a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>
<b>Cc:</b> Eloir Troyack<br>
<b>Subject:</b> EXT: [lustre-discuss] Installing lustre 2.15.6 server on rhel-8.10 fails</font>
<div> </div>
</div>
<div>
<div><span style="font-weight:bold">***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.</span></div>
<div><br>
</div>
<div dir="ltr">
<div>Hello all,</div>
<div><br>
</div>
<div>My current version of RHEL 8 is Rocky Linux 8.10, running the kernel 4.18.0-553.27.1.el8_10. I also have the OFED drivers version 24.10-2.1.8.0 installed for the InfiniBand interface (I tried without OFED before).</div>
<div></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div>The installation of "kmod-lustre-2.15.6-1.el8" and "kmod-lustre-osd-ldiskfs-2.15.6-1" always shows these warning messages below.</div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace"># dnf --nogpgcheck --enablerepo=lustre-server install kmod-lustre kmod-lustre-osd-ldiskfs lustre-osd-ldiskfs-mount lustre lustre-resource-agents</span></div>
<span style="font-family:monospace">[...]</span>
<div><span style="font-family:monospace">depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __ib_alloc_pd<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_resolve_addr<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_dereg_mr_user<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_reject<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_disconnect<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __rdma_create_kernel_id<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_register_event_handler<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_resolve_route<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_unregister_event_handler<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_bind_addr<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_create_qp<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_map_mr_sg<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_query_port<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_notify<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_listen<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_destroy_qp<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol __ib_create_cq<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_alloc_mr<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_connect_locked<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_set_reuseaddr<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_destroy_cq_user<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_modify_qp<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_dma_virt_map_sg<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_destroy_id<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol rdma_accept<br>
depmod: WARNING: /lib/modules/4.18.0-553.27.1.el8_lustre.x86_64/extra/lustre/net/ko2iblnd.ko needs unknown symbol ib_dealloc_pd_user</span></div>
<div><span style="font-family:monospace">[...]<br>
Installed:<br>
kernel-core-4.18.0-553.27.1.el8_lustre.x86_64 kmod-lustre-2.15.6-1.el8.x86_64 kmod-lustre-osd-ldiskfs-2.15.6-1.el8.x86_64 lustre-2.15.6-1.el8.x86_64 lustre-osd-ldiskfs-mount-2.15.6-1.el8.x86_64
<br>
lustre-resource-agents-2.15.6-1.el8.x86_64 <br>
</span></div>
<div><span style="font-family:monospace"><br>
</span></div>
<div><span style="font-family:monospace">Completed!</span></div>
<div><br>
</div>
<div><br>
</div>
<div>After rebooting, the server drops into an emergency shell because it can't find the LVM devices. This issue only occurs with the Lustre kernel, other installed kernels boot normally.</div>
<div></div>
<div><br>
</div>
<div></div>
<div><br>
</div>
<div>Any hints on how to proceed?</div>
<br>
<div><br clear="all">
</div>
<div>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div></div>
---<br>
<div><i><b>Carlos Adean</b></i></div>
<div><a href="https://www.linea.org.br" id="m_8997662621852205494m_8633720231178926790LPlnk223764" target="_blank">www.linea.org.br</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</body>
</html>