[lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot - SOLVED

Sid Young sid.young at gmail.com
Wed Nov 10 14:23:53 PST 2021


I've managed to solve this after checking a few nodes in the cluster and
discovered this particular node must have had a partial update resulting in
a mismatch between the kernel version (locked at base release) and some of
the kernel support files which appeared to be a slightly later release
causing the DKMS to not generate the required files.

Normally I disable kernel updates in YUM so  everything is at the same
release version and just update packages until I'm ready for a major update
cycle.

bad node:

# yum list installed | grep kernel
abrt-addon-kerneloops.x86_64           2.1.11-60.el7.centos
@anaconda
kernel.x86_64                          3.10.0-1160.el7
 @anaconda
kernel-debug-devel.x86_64              3.10.0-1160.15.2.el7
@updates
kernel-devel.x86_64                    3.10.0-1160.15.2.el7
@updates
kernel-headers.x86_64                  3.10.0-1160.15.2.el7
@updates
kernel-tools.x86_64                    3.10.0-1160.15.2.el7
@updates
kernel-tools-libs.x86_64               3.10.0-1160.15.2.el7
@updates
#

Working node:
# yum list installed | grep kernel
abrt-addon-kerneloops.x86_64           2.1.11-60.el7.centos
@anaconda
kernel.x86_64                          3.10.0-1160.el7
 @anaconda
kernel-debug-devel.x86_64              3.10.0-1160.31.1.el7
@updates
kernel-devel.x86_64                    3.10.0-1160.el7
 @/kernel-devel-3.10.0-1160.el7.x86_64
kernel-headers.x86_64                  3.10.0-1160.el7
 @anaconda
kernel-tools.x86_64                    3.10.0-1160.el7
 @anaconda
kernel-tools-libs.x86_64               3.10.0-1160.el7
 @anaconda
#

After I removed the extraneous release packages and the lustre packages, I
then updated the kernel and re-installed the kernel-headers and
kernel-devel code then installed the (minimal) lustre client:

# yum list installed|grep lustre
kmod-lustre-client.x86_64              2.12.7-1.el7
@/kmod-lustre-client-2.12.7-1.el7.x86_64
lustre-client.x86_64                   2.12.7-1.el7
@/lustre-client-2.12.7-1.el7.x86_64
lustre-client-dkms.noarch              2.12.7-1.el7
@/lustre-client-dkms-2.12.7-1.el7.noarch
#

And all good, every mounts and works first go as expected :)



Sid Young
Translational Research Institute
Brisbane



> ---------- Forwarded message ----------
> From: Sid Young <sid.young at gmail.com>
> To: lustre-discuss <lustre-discuss at lists.lustre.org>
> Cc:
> Bcc:
> Date: Mon, 8 Nov 2021 11:15:59 +1000
> Subject: [lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot?
> I was running 2.12.6 on a HP DL385 running standard Centos 7.9
> (3.10.0-1160.el7.x86_64) for around 6 months and decided to plan and start
> an upgrade cycle to 2.12.7, so I downloaded and installed the 2.12.7 centos
> release from whamcloud using the 7.9.2009 release RPMS
>
> # cat /etc/centos-release
> CentOS Linux release 7.9.2009 (Core)
>
> I have tried on the a node and I now have the following error after I
> rebooted:
>
> # modprobe -v lnet
> modprobe: FATAL: Module lnet not found.
>
> I suspect its not built against the kernel as there are 3 releases showing
> and no errors during the yum install process:
>
> # ls -la  /usr/lib/modules
> drwxr-xr-x.  3 root root 4096 Mar 18  2021 3.10.0-1160.2.1.el7.x86_64
> drwxr-xr-x   3 root root 4096 Nov  8 10:32 3.10.0-1160.25.1.el7.x86_64
> drwxr-xr-x.  7 root root 4096 Nov  8 11:02 3.10.0-1160.el7.x86_64
> #
>
> Anyone upgraded this way? Any obvious gottas I've missed?
>
> Sid Young
> Translational Research Institute
> Brisbane
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211111/0c36c70b/attachment.html>


More information about the lustre-discuss mailing list