[lustre-discuss] Recover from broken lustre updates

Haoyang Liu liuhaoyang at pku.edu.cn
Mon Jul 26 01:28:26 PDT 2021


Hi all,

I am using Lustre 2.7 along with mlnx infiniband. Recently I by mistake
perform a system update and after the update the lustre modules won't load.

System configuration before the update:
centos-7.3, kernel version: 3.10.0-514.2.2.el7_lustre.gba8983e.x86_64
lustre version: 2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64
mlnx-ofed version: 4.2.1.2.0.1.gf8de107.kver.3.10.0_514.2.2.el7_lustre.gba8983e.x86_64.x86_64

System configuration after the update:
centos-7.3, kernel version: 3.10.0-514.2.2.el7_lustre.x86_64
lustre version: 2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
mlnx-ofed version: 4.2.1.2.0.1.gf8de107.kver.3.10.0_514.2.2.el7_lustre.gba8983e.x86_64.x86_64

The update seems to just replace the linux kernel with a different patch version (w/o gba8983e),
and rebuild the lustre modules (no upgrading for lustre). However, the lustre modules are built against the wrong version
of mlnx-ofed. dmesg shows the following errors:


[17509.744301] ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
[17509.744307] ko2iblnd: Unknown symbol ib_fmr_pool_unmap (err -22)
[17509.744317] ko2iblnd: disagrees about version of symbol ib_create_cq
[17509.744319] ko2iblnd: Unknown symbol ib_create_cq (err -22)
[17509.744332] ko2iblnd: disagrees about version of symbol rdma_resolve_addr
[17509.744334] ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)
[17509.744345] ko2iblnd: disagrees about version of symbol ib_create_fmr_pool
...

I've tried to build mlnx-ofed under the updated kernel, but the problem still exists.

My questions:
1) how to restore the lustre system before the updates? The following RPMs are already present on my server:
----------------
kernel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
kernel-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
kernel-headers-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
kernel-tools-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
kernel-tools-libs-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
kernel-tools-libs-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
kmod-spl-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
kmod-spl-devel-0.6.5.7-1.el7.x86_64.rpm
kmod-spl-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
kmod-zfs-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
kmod-zfs-devel-0.6.5.7-1.el7.x86_64.rpm
kmod-zfs-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
libnvpair1-0.6.5.7-1.el7.x86_64.rpm
libuutil1-0.6.5.7-1.el7.x86_64.rpm
libzfs2-0.6.5.7-1.el7.x86_64.rpm
libzfs2-devel-0.6.5.7-1.el7.x86_64.rpm
libzpool2-0.6.5.7-1.el7.x86_64.rpm
lustre-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-dkms-2.7.19.8-1.el7.noarch.rpm
lustre-iokit-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-modules-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-osd-ldiskfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-osd-ldiskfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-osd-zfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-osd-zfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-source-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
lustre-tests-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
mlnx-ofa_kernel-4.2-OFED.4.2.1.2.0.1.gf8de107.x86_64.rpm
mlnx-ofa_kernel-devel-4.2-OFED.4.2.1.2.0.1.gf8de107.x86_64.rpm
mlnx-ofa_kernel-modules-4.2-OFED.4.2.1.2.0.1.gf8de107.kver.3.10.0_514.2.2.el7_lustre.gba8983e.x86_64.x86_64.rpm
perf-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
python-perf-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
spl-0.6.5.7-1.el7.x86_64.rpm
spl-dkms-0.6.5.7-1.el7.noarch.rpm
zfs-0.6.5.7-1.el7.x86_64.rpm
zfs-dkms-0.6.5.7-1.el7.noarch.rpm
zfs-dracut-0.6.5.7-1.el7.x86_64.rpm
zfs-test-0.6.5.7-1.el7.x86_64.rpm
----------------

2) What is the risk of my data loss?


Thanks,

Haoyang


More information about the lustre-discuss mailing list