[lustre-discuss] ksym errors with MOFED (again)

Matt Rásó-Barnett matt at rasobarnett.com
Sun Aug 23 07:54:03 PDT 2020


Hi all,

I've been attempting to build Lustre server RPMs against MOFED the past 
few days and keep hitting a dependency problem where the kmod-lustre 
packages have dependencies on various ksym symbols that are not being 
satisfied by the MOFED rpms available.

I'm building here with:
Lustre 2.12.5
kernel 3.10.0-1127.8.2.el7_lustre.x86_64
MOFED 4.9-0.1.7.0 

(although I've had the same result with kernel 
3.10.0-1127.18.2.el7_lustre.x86_64 and MOFED 5.1-0.6.6.0):

[user at machine lustre-release]# yum localinstall 
kmod-lustre-2.12.5-1.el7.x86_64.rpm 
kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64.rpm 
lustre-2.12.5-1.el7.x86_64.rpm 
lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64.rpm
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Examining kmod-lustre-2.12.5-1.el7.x86_64.rpm: kmod-lustre-2.12.5-1.el7.x86_64
Marking kmod-lustre-2.12.5-1.el7.x86_64.rpm to be installed
Examining kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64.rpm: kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64
Marking kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64.rpm to be installed
Examining lustre-2.12.5-1.el7.x86_64.rpm: lustre-2.12.5-1.el7.x86_64
Marking lustre-2.12.5-1.el7.x86_64.rpm to be installed
Examining lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64.rpm: lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64
Marking lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package kmod-lustre.x86_64 0:2.12.5-1.el7 will be installed
--> Processing Dependency: ksym(__ib_alloc_pd) = 0x9cbf7973 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(__ib_create_cq) = 0x89e52306 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(__rdma_accept) = 0x8de99f59 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(__rdma_create_id) = 0xb4dc7b7e for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(backport_dependency_symbol) = 0xb43a926b for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_alloc_mr_user) = 0x1fb7fcc9 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_create_fmr_pool) = 0x1f5667d3 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_dealloc_pd_user) = 0x534a2aa9 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_dereg_mr_user) = 0x02332dc6 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_destroy_cq_user) = 0x6391feb0 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_fmr_pool_map_phys) = 0xdcf9c30f for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_fmr_pool_unmap) = 0xd0481e41 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_get_dma_mr) = 0x366559bd for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_map_mr_sg) = 0x0366904f for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_modify_qp) = 0x31adefba for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_bind_addr) = 0x445a242e for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_connect) = 0xaed8f42f for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_create_qp) = 0x247ddac2 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_destroy_id) = 0x7ea42958 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_destroy_qp) = 0xfa90a30a for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_disconnect) = 0x72109dd0 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_listen) = 0xff8db636 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_notify) = 0x7d20777a for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_reject) = 0x28d81cc0 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_resolve_addr) = 0x65e39e38 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_resolve_route) = 0xa3b7af34 for package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_set_reuseaddr) = 0x11e1ebcc for package: kmod-lustre-2.12.5-1.el7.x86_64
---> Package kmod-lustre-osd-ldiskfs.x86_64 0:2.12.5-1.el7 will be installed
---> Package lustre.x86_64 0:2.12.5-1.el7 will be installed
---> Package lustre-osd-ldiskfs-mount.x86_64 0:2.12.5-1.el7 will be installed
--> Finished Dependency Resolution
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 (/kmod-lustre-2.12.5-1.el7.x86_64)
            Requires: ksym(rdma_set_reuseaddr) = 0x11e1ebcc
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 (/kmod-lustre-2.12.5-1.el7.x86_64)
            Requires: ksym(ib_fmr_pool_map_phys) = 0xdcf9c30f
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 
(/kmod-lustre-2.12.5-1.el7.x86_64)
            Requires: ksym(ib_dealloc_pd_user) = 0x534a2aa9
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 (/kmod-lustre-2.12.5-1.el7.x86_64)
            Requires: ksym(backport_dependency_symbol) = 0xb43a926b
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 (/kmod-lustre-2.12.5-1.el7.x86_64)
            Requires: ksym(ib_modify_qp) = 0x31adefba
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 (/kmod-lustre-2.12.5-1.el7.x86_64)
            Requires: ksym(rdma_resolve_route) = 0xa3b7af34

... snip ...


I know this issue has come up a few times on this list in the past:

- Most recently:
(http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-October/016738.html)
- and I raised a similar issue two years ago the last time I was 
building with MOFED:
(http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2018-August/015796.html)

I'm closely following the procedure in the wiki: 
http://wiki.lustre.org/Compiling_Lustre

- Build patched kernel rpms, install these, particularly *-devel, 
removing all other kernels from build machine

- Build custom MOFED against patched kernel:
./mlnx_add_kernel_support.sh --make-tgz --verbose --yes --kernel 
3.10.0-1127.8.2.el7_lustre.x86_64 --kernel-sources 
/usr/src/kernels/3.10.0-1127.8.2.el7_lustre.x86_64 --tmpdir /tmp 
--distro rhel7.8 --mlnx_ofed 
/root/MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64 --kmp

- Install custom MOFED modules
Either: 
yum localinstall 
{mlnx-ofa_kernel-[0-9].*,mlnx-ofa_kernel-devel-[0-9].*,mlnx-ofa_kernel-modules-[0-9].*}.x86_64.rpm
or
yum localinstall mlnx-ofed-all...

- Build Lustre against the MOFED IB stack:
./configure --enable-server 
--with-linux=/usr/src/kernels/3.10.0-1127.8.2.el7_lustre.x86_64 
--with-o2ib=/usr/src/ofa_kernel/default

However in every case I've tried, this results in a kmod-lustre package 
that has dependencies that the rebuilt MOFED modules packages do not 
provide:

[user at machine lustre-release]# rpm -q --requires -p 
kmod-lustre-2.12.5-1.el7.x86_64.rpm | grep ksym
ksym(__ib_alloc_pd) = 0x9cbf7973
ksym(__ib_create_cq) = 0x89e52306
ksym(__rdma_accept) = 0x8de99f59
ksym(__rdma_create_id) = 0xb4dc7b7e
ksym(backport_dependency_symbol) = 0xb43a926b
ksym(ib_alloc_mr_user) = 0x1fb7fcc9
ksym(ib_create_fmr_pool) = 0x1f5667d3
ksym(ib_dealloc_pd_user) = 0x534a2aa9
ksym(ib_dereg_mr_user) = 0x02332dc6
ksym(ib_destroy_cq_user) = 0x6391feb0
ksym(ib_fmr_pool_map_phys) = 0xdcf9c30f
ksym(ib_fmr_pool_unmap) = 0xd0481e41
ksym(ib_get_dma_mr) = 0x366559bd
ksym(ib_map_mr_sg) = 0x0366904f
ksym(ib_modify_qp) = 0x31adefba
ksym(rdma_bind_addr) = 0x445a242e
ksym(rdma_connect) = 0xaed8f42f
ksym(rdma_create_qp) = 0x247ddac2
ksym(rdma_destroy_id) = 0x7ea42958
ksym(rdma_destroy_qp) = 0xfa90a30a
ksym(rdma_disconnect) = 0x72109dd0
ksym(rdma_listen) = 0xff8db636
ksym(rdma_notify) = 0x7d20777a
ksym(rdma_reject) = 0x28d81cc0
ksym(rdma_resolve_addr) = 0x65e39e38
ksym(rdma_resolve_route) = 0xa3b7af34
ksym(rdma_set_reuseaddr) = 0x11e1ebcc

[user at machine MLNX_LIBS]# rpm -q --provides -p mlnx-ofa_kernel*.rpm | 
grep ksym

As Stefane mentioned in the October thread 
(http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-October/016749.html)
the only package in MOFED that appears to provide these symbols is the 
kmod-mlnx-ofa_kernel package that is *only* built when KMP is supported, 
which I've found is only the case when building against an *unpatched* 
distribution kernel.

eg:

# Searching for kmod-mlnx-ofa_kernel in the downloaded MOFED from MLNX
[user at machine ~]# find 
MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64/RPMS/COMMON -name 
kmod-mlnx-ofa_kernel*
MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64/RPMS/COMMON/kmod-mlnx-ofa_kernel-4.9-OFED.4.9.0.1.7.1.gd3d963b.rhel7u8.x86_64.rpm

# Not present in the MOFED rebuilt against the patched lustre kernel
[user at machine ~]# find 
MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64-ext/RPMS/COMMON -name 
kmod-mlnx-ofa_kernel*
[user at machine ~]#

Digging into the MOFED install script 'install.pl', inside 
MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64/src/MLNX_OFED_SRC-4.9-0.1.7.0.tgz
I can see why this is:

1401 if ($kmp and ($DISTRO =~ m/XenServer|RHEL5.2|FC|WINDRIVER6|POWERKVM|BLUENIX1/ or $kernel =~ /xs|fbk|fc|debug|lustre/)) {
1402     print_and_log_colored("KMP is not supported on $DISTRO. Switching to non-KMP mode", $verbose2, "RED");
1403     $kmp = 0;
1404 }

So essentially if the kernel version contains 'lustre' in it, then KMP 
support is disabled, and it will not build the kmod packages.

By removing that check and rebuilding, I indeed get a set of kmod-mlnx-* 
RPMS produced that provide the necessary symbols, eg:

[user at machine COMMON]# rpm -q --provides -p 
kmod-mlnx-ofa_kernel-4.9-OFED.4.9.0.1.7.1.gd3d963b.202008230901.rhel7u8.x86_64.rpm 
| grep 'ksym(rdma_connect'
ksym(rdma_connect) = 0xaed8f42f

and with this installed, I can install the Lustre packages correctly 
finally.

However this leaves me with a number of questions:

* Is this check MLNX have added actually incorrect? It has been present 
since MOFED 4.2, and maybe we shouldn't be building with KMP support 
since this isn't a distro kernel?

* Building MOFED *without* KMP support produces the 
mlnx-ofa_kernel-modules package instead which contains the kernel 
modules. Should *this* package not provide the 'ksym' symbols that the 
lustre package is picking up?

* Or is there something wrong with the Lustre build scripts, picking up 
these ksym dependencies when it shouldn't?

* Or am I doing something completely wrong and everyone else is building 
Lustre servers + MOFED happily and I just need to fix my build process 
to match?

Apologies for such a long-winded email, but this has been driving me 
slightly mad the past couple of days and I'd like to get to the bottom 
of what's going on.

If anyone has had success building this combination (which I'm sure 
plenty have!) please can you let me know if you've encountered this 
issue, or if not what you are doing differently?

Kind regards,
Matt

-- 
Matt Rásó-Barnett


More information about the lustre-discuss mailing list