[lustre-discuss] Lustre 2.15.1 server with ZFS nothing provides ksym

Peter Jones pjones at whamcloud.com
Tue Jul 11 08:59:15 PDT 2023


Jon

Have you tried building 2.15.3 against Rocky 8.6 locally? While it is true that Whamcloud focuses testing on the last RHEL/Rocky minor release, many sites still use the same Lustre software with down-rev OS versions….

Peter

From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Jon Marshall via lustre-discuss <lustre-discuss at lists.lustre.org>
Reply-To: Jon Marshall <Jon.Marshall at cruk.cam.ac.uk>
Date: Tuesday, July 11, 2023 at 8:38 AM
To: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Lustre 2.15.1 server with ZFS nothing provides ksym

Hi,

I'm having a bit of a nightmare trying to build server packages for 2.15.1 - I feel like I've tried quite a few different approaches and am getting stumped, it is most likely that I am missing something incredibly obvious so I'd appreciate any pointers. I'd like to point out that I am by no means an expert in any of this though I have had about 4 years maintaining various Lustre builds - I am hugely grateful for all the work you guys do on Lustre, and I hope I don't come across as anything other than frustrated by my inability to get this version to install!

To start, we are running Rocky 8.6, with the 4.18.0-372.9.1.el8 kernel. Initially, as with previous Lustre builds I've done, I've installed the Whamcloud provided el8_lustre kernel, along with headers, then installed the Mellanox OFED stack from their repos, making sure to use the version that appears to be being used for 2.15.1 builds (in this case 5.6-2.0.9.0 - incidentally I can't find the specific OFEDs used in any compatibility matrices or changelogs, where this information used to be provided, so I've back formulated from the Whamcloud repos).

In the past, I've then simply yum/dnf installed the lustre server packages. With this release however, I immediately run in to "Nothing provides _ksym" errors, all of which appear to be for ZFS symbols. A quick check on the Whamcloud Jira throws up this<https://jira.whamcloud.com/browse/LU-16109> issue, which is marked closed but references this<https://jira.whamcloud.com/browse/LU-16059> issue, which says that the issue is fixed but not for 2.15.1, instead for 2.15.2.

I'm intending to build these servers with kickstart and puppet, and I'd much rather use the official repos rather than compile it myself but this is not a hard requirement. A quick check on the Whamcloud repos and it appears that 2.15.2 only supports 8.7, rather than 8.6. This is a bit of a problem as I'd like to keep the same kernel version as the rest of the machines we're running where possible, but again, not a hard requirement. I spun up a new build for 8.7 on the same hardware, updated the Mellanox repos to point to the new correct version and immediately got _ksym errors but now it appears they're for the OFEDs instead.

In the meantime, I've seen an email in the mailing list suggesting that the symbols are in fact provided by the package kmod-zfs, which is not provided by the OpenZFS repos, but that can be built manually, so I thought I'd have another crack at getting 2.15.1 working. I download the tar, built ZFS and installed the resulting rpms, making sure to install devel and debugsource and debuginfo. I attempted to build lustre against this and it all appears to go ok - I get some rpms out! However, installing them results in the exact same ksym errors. The thing is the ksyms appear to be present by name in /proc/kallsyms, just not matching exactly.

The main point I guess is that LU-16059 appears to have been closed erroneously, as on a fresh install the issue is 100% reproducible. I also note that the packages hosted here<https://downloads.whamcloud.com/public/lustre/lustre-2.15.1/el8.6/server/RPMS/x86_64/> have timestamps from 2022-08-10 but the issue was created and closed after this. I'm happy to re-open the bug and provide as much detail as necessary but thought I'd check to see if anyone else has experienced this issue or if I am indeed missing something trivial.

Thanks in advance
Jon


Jon Marshall

High Performance Computing Specialist



IT and Scientific Computing Team



Cancer Research UK Cambridge Institute

Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE

Web<http://www.cruk.cam.ac.uk/> | Facebook<http://www.facebook.com/cancerresearchuk> | Twitter<http://twitter.com/CR_UK>



[Description: CRI Logo]<http://www.cruk.cam.ac.uk/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230711/ce008380/attachment-0001.htm>


More information about the lustre-discuss mailing list