[lustre-discuss] DKMS build broken with NVIDIA doca packages
Christopher J Orr
cjorr at purdue.edu
Wed Jan 21 08:44:06 PST 2026
This is how I ended up fixing it on Lustre 2.14.0_ddn191 on Rocky 9.7
with DOCA-OFED.
------------------------------------------------------------------
--- lustre-dkms_pre-build.sh.orig 2026-01-06 16:55:25.428285300 -
0500
+++ lustre-dkms_pre-build.sh 2026-01-06 18:00:28.357307490 -0500
@@ -9,8 +9,9 @@
case $1 in
lustre-client)
+ [ -f /etc/sysconfig/lustre ] && . /etc/sysconfig/lustre
SERVER="--disable-server"
- KERNEL_STUFF=""
+ KERNEL_STUFF="${KERNEL_STUFF:-}"
;;
lustre-zfs|lustre-all)
------------------------------------------------------------------
...and then, add
KERNEL_STUFF="--with-o2ib=/usr/src/ofa_kernel/default/"
...to /etc/sysconfig/lustre
I hope this helps!
Thanks,
Christopher Orr
On Wed, 2026-01-21 at 16:16 +0000, Patrick Farrell via lustre-discuss
wrote:
>
> ---- External Email: Use caution with attachments, links, or sharing
> data ----
>
>
>
>
> Folks, if you want to create a JIRA ticket, you can ask for an
> account. We're very happy to get contributions.
>
>
> Regards,
> Patrick
>
>
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on
> behalf of Jon Marshall via lustre-discuss
> <lustre-discuss at lists.lustre.org>
> Sent: Wednesday, January 21, 2026 9:36 AM
> To: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] DKMS build broken with NVIDIA doca
> packages
>
>
>
>
>
>
> Hi Mark,
>
>
> Thanks for confirming I'm not on my own - I've not got any further,
> other than starting to look at creating a dummy RPM package that fits
> the criteria Lustre is looking for! That or using a very clunky
> wrapper script around rpm itself to lie to the configure script. I
> actually have got this second approach working so there is nothing
> wrong with building against the doca packages, but its a bit annoying
> to automate the build process for our servers like this.
>
>
> I've not got access to create a Jira ticket myself either.
>
>
> Cheers
> Jon
>
>
> From: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Sent: Wednesday, January 21, 2026 12:23
> To: Jon Marshall <Jon.Marshall at cruk.cam.ac.uk>
> Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] DKMS build broken with NVIDIA doca
> packages
>
>
>
>
> Hi Jon,
>
> As it happens, I've been looking at the same thing. I hadn't spotted
> LU-18002 (thanks), but unfortunately it isn't enough to accommodate
> the
> move to dkms on rhel.
>
> I don't know how far you've got since Monday, but there now seems a
> need
> for an explicit check of /usr/src/ofa_kernel (as it's no longer owned
> by a
> package) and the "find" for rdma_cm.h needs the -L flag to make sense
> of
> the new maze of twisty passages.
>
> I think that a new jira ticket needs to be opened...
>
> Cheers,
>
> Mark
>
>
> On Mon, 19 Jan 2026, Jon Marshall via lustre-discuss wrote:
>
> > [EXTERNAL EMAIL]
> > Hi,
> >
> > I'm in the process of rebuilding lustre on Rocky 8.10 and have
> > noticed that NVIDIA have been messing around with their packages
> > again, now rebranding everything under the doca label. For LTS
> > purposes we're sticking with 2.15.8 for lustre, and I'm trying to
> > get this to build with NVIDIA DOCA 3.2.1 LTS.
> >
> > The trouble is, it seems they have rename the package mlnx-
> > ofa_kernel-devel to mlnx-ofa_kernel-dkms. Looking at the DKMS
> > configure script, it is searching for:
> > O2IBPKG="mlnx-ofed-kernel-dkms"
> > O2IBPKG+="|mlnx-ofed-kernel-modules"
> > O2IBPKG+="|mlnx-ofa_kernel-devel"
> > O2IBPKG+="|compat-rdma-devel"
> > O2IBPKG+="|kernel-ib-devel"
> > O2IBPKG+="|ofa_kernel-devel"
> >
> > And hence it can't find the package (underscore instead of hyphen),
> > which causes the build to fail.
> >
> > Digging around the JIRA, I found
> > this<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fjira.whamc
> > loud.com%2fbrowse%2fLU-
> > 18002%3fjql%3dtext%2520~%2520dkms%2520ORDER%2520BY%2520created%2520
> > DESC&c=E,1,jSSRk0tXHMx8RQEMnGYEBCTdjBWE-
> > 7d4UZni7OYRCsspax3v09_1sRG4eF9iy77rKx5DppDWrhVsH9ZQ7lk_1OT3Wmb_XeUj
> > WfNuEPbhpR8,&typo=1> issue, but it looks to only have been fixed in
> > 2.16, which we've sort of ruled out at this stage. Looking at the
> > actual
> > patch<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2freview.wh
> > amcloud.com%2fc%2ffs%2flustre-
> > release%2f%20%2f55625%2f4%2flnet%2fautoconf%2flustre-
> > lnet.m4&c=E,1,Wi5eGkf0dY16u2VrGeX06tAPDP6YCLAJhfgPURLolu4ssfvLF8Xiw
> > PpqpixQifO1NdxtNZ5tpz8FAqP5gd419t_Yvuu_c-
> > NzIAY1JvTjYeVLYQ,,&typo=1>, it seems pretty minor and I was
> > wondering if this could be back ported to 2.15 as well.
> >
> > I can work around by building things myself, but I was hoping to be
> > able to yum install the packages direct from the whamcloud repos,
> > as this greatly simplifies my rollout.
> >
> > Cheers
> > Jon
> >
> >
> > Jon Marshall
> >
> > High Performance Computing Specialist
> >
> >
> >
> > IT and Scientific Computing Team
> >
> >
> >
> > Cancer Research UK Cambridge Institute
> >
> > Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE
> >
> > Web<http://www.cruk.cam.ac.uk/> |
> > Facebook<http://www.facebook.com/cancerresearchuk> |
> > Twitter<https://linkprotect.cudasvc.com/url?a=http%3a%2f%2ftwitter.
> > com%2fCR_UK&c=E,1,aCcWa5p892R3_9Lj1VLXiO9wgithO5AHQZh841zayJAVcOaCk
> > JC2gyGFMTpTADviZ3xtPn6klyCExiJqHjg1k5lzggxNNPrsaIis62wIBwOJ&typo=1>
> >
> >
> >
> > [Description: CRI Logo]<http://www.cruk.cam.ac.uk/>
> >
> >
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list