[lustre-discuss] DKMS build broken with NVIDIA doca packages

Mark Dixon mark.c.dixon at durham.ac.uk
Thu Jan 22 01:33:54 PST 2026


Hi Christopher,

We previously used a similar approach but, with the (very welcome!!) move 
to using DKMS on EL, DOCA now supports multiple kernels at the same time 
and so maintains a per-kernel ofa source directory - a blanket default is 
no longer appropriate.

In fact, on one of my test hosts /usr/src/ofa_kernel/default ended up 
becoming a dangling link. Not sure if that was a bug, or if DOCA has given 
up on it.

Unless Jon gets there first, I'll get a ticket opened when I get to it.

Best,

Mark

On Wed, 21 Jan 2026, Christopher J Orr wrote:

> [You don't often get email from cjorr at purdue.edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> [EXTERNAL EMAIL]
>
> This is how I ended up fixing it on Lustre 2.14.0_ddn191 on Rocky 9.7
> with DOCA-OFED.
>
> ------------------------------------------------------------------
> --- lustre-dkms_pre-build.sh.orig       2026-01-06 16:55:25.428285300 -
> 0500
> +++ lustre-dkms_pre-build.sh    2026-01-06 18:00:28.357307490 -0500
> @@ -9,8 +9,9 @@
>
> case $1 in
>     lustre-client)
> +       [ -f /etc/sysconfig/lustre ] && . /etc/sysconfig/lustre
>        SERVER="--disable-server"
> -       KERNEL_STUFF=""
> +       KERNEL_STUFF="${KERNEL_STUFF:-}"
>        ;;
>
>     lustre-zfs|lustre-all)
> ------------------------------------------------------------------
>
> ...and then, add
> KERNEL_STUFF="--with-o2ib=/usr/src/ofa_kernel/default/"
> ...to /etc/sysconfig/lustre
>
> I hope this helps!
> Thanks,
> Christopher Orr
>
>
> On Wed, 2026-01-21 at 16:16 +0000, Patrick Farrell via lustre-discuss
> wrote:
>>
>> ---- External Email: Use caution with attachments, links, or sharing
>> data ----
>>
>>
>>
>>
>> Folks, if you want to create a JIRA ticket, you can ask for an
>> account.  We're very happy to get contributions.
>>
>>
>> Regards,
>> Patrick
>>
>>
>> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on
>> behalf of Jon Marshall via lustre-discuss
>> <lustre-discuss at lists.lustre.org>
>> Sent: Wednesday, January 21, 2026 9:36 AM
>> To: Mark Dixon <mark.c.dixon at durham.ac.uk>
>> Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
>> Subject: Re: [lustre-discuss] DKMS build broken with NVIDIA doca
>> packages
>>
>>
>>
>>
>>
>>
>> Hi Mark,
>>
>>
>> Thanks for confirming I'm not on my own - I've not got any further,
>> other than starting to look at creating a dummy RPM package that fits
>> the criteria Lustre is looking for! That or using a very clunky
>> wrapper script around rpm itself to lie to the configure script. I
>> actually have got this second approach working so there is nothing
>> wrong with building against the doca packages, but its a bit annoying
>> to automate the build process for our servers like this.
>>
>>
>> I've not got access to create a Jira ticket myself either.
>>
>>
>> Cheers
>> Jon
>>
>>
>> From: Mark Dixon <mark.c.dixon at durham.ac.uk>
>> Sent: Wednesday, January 21, 2026 12:23
>> To: Jon Marshall <Jon.Marshall at cruk.cam.ac.uk>
>> Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
>> Subject: Re: [lustre-discuss] DKMS build broken with NVIDIA doca
>> packages
>>
>>
>>
>>
>> Hi Jon,
>>
>> As it happens, I've been looking at the same thing. I hadn't spotted
>> LU-18002 (thanks), but unfortunately it isn't enough to accommodate
>> the
>> move to dkms on rhel.
>>
>> I don't know how far you've got since Monday, but there now seems a
>> need
>> for an explicit check of /usr/src/ofa_kernel (as it's no longer owned
>> by a
>> package) and the "find" for rdma_cm.h needs the -L flag to make sense
>> of
>> the new maze of twisty passages.
>>
>> I think that a new jira ticket needs to be opened...
>>
>> Cheers,
>>
>> Mark
>>
>>
>> On Mon, 19 Jan 2026, Jon Marshall via lustre-discuss wrote:
>>
>>> [EXTERNAL EMAIL]
>>> Hi,
>>>
>>> I'm in the process of rebuilding lustre on Rocky 8.10 and have
>>> noticed that NVIDIA have been messing around with their packages
>>> again, now rebranding everything under the doca label. For LTS
>>> purposes we're sticking with 2.15.8 for lustre, and I'm trying to
>>> get this to build with NVIDIA DOCA 3.2.1 LTS.
>>>
>>> The trouble is, it seems they have rename the package mlnx-
>>> ofa_kernel-devel to mlnx-ofa_kernel-dkms. Looking at the DKMS
>>> configure script, it is searching for:
>>>                         O2IBPKG="mlnx-ofed-kernel-dkms"
>>>                         O2IBPKG+="|mlnx-ofed-kernel-modules"
>>>                         O2IBPKG+="|mlnx-ofa_kernel-devel"
>>>                         O2IBPKG+="|compat-rdma-devel"
>>>                         O2IBPKG+="|kernel-ib-devel"
>>>                         O2IBPKG+="|ofa_kernel-devel"
>>>
>>> And hence it can't find the package (underscore instead of hyphen),
>>> which causes the build to fail.
>>>
>>> Digging around the JIRA, I found
>>> this<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fjira.whamc
>>> loud.com%2fbrowse%2fLU-
>>> 18002%3fjql%3dtext%2520~%2520dkms%2520ORDER%2520BY%2520created%2520
>>> DESC&c=E,1,jSSRk0tXHMx8RQEMnGYEBCTdjBWE-
>>> 7d4UZni7OYRCsspax3v09_1sRG4eF9iy77rKx5DppDWrhVsH9ZQ7lk_1OT3Wmb_XeUj
>>> WfNuEPbhpR8,&typo=1> issue, but it looks to only have been fixed in
>>> 2.16, which we've sort of ruled out at this stage. Looking at the
>>> actual
>>> patch<https://linkprotect.cudasvc.com/url?a=https%3a%2f%2freview.wh
>>> amcloud.com%2fc%2ffs%2flustre-
>>> release%2f%20%2f55625%2f4%2flnet%2fautoconf%2flustre-
>>> lnet.m4&c=E,1,Wi5eGkf0dY16u2VrGeX06tAPDP6YCLAJhfgPURLolu4ssfvLF8Xiw
>>> PpqpixQifO1NdxtNZ5tpz8FAqP5gd419t_Yvuu_c-
>>> NzIAY1JvTjYeVLYQ,,&typo=1>, it seems pretty minor and I was
>>> wondering if this could be back ported to 2.15 as well.
>>>
>>> I can work around by building things myself, but I was hoping to be
>>> able to yum install the packages direct from the whamcloud repos,
>>> as this greatly simplifies my rollout.
>>>
>>> Cheers
>>> Jon
>>>
>>>
>>> Jon Marshall
>>>
>>> High Performance Computing Specialist
>>>
>>>
>>>
>>> IT and Scientific Computing Team
>>>
>>>
>>>
>>> Cancer Research UK Cambridge Institute
>>>
>>> Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE
>>>
>>> Web<http://www.cruk.cam.ac.uk/> |
>>> Facebook<http://www.facebook.com/cancerresearchuk> |
>>> Twitter<https://linkprotect.cudasvc.com/url?a=http%3a%2f%2ftwitter.
>>> com%2fCR_UK&c=E,1,aCcWa5p892R3_9Lj1VLXiO9wgithO5AHQZh841zayJAVcOaCk
>>> JC2gyGFMTpTADviZ3xtPn6klyCExiJqHjg1k5lzggxNNPrsaIis62wIBwOJ&typo=1>
>>>
>>>
>>>
>>> [Description: CRI Logo]<http://www.cruk.cam.ac.uk/>
>>>
>>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


More information about the lustre-discuss mailing list