[lustre-discuss] Recover from broken lustre updates (Haoyang Liu)

肖正刚 guru.novice at gmail.com
Mon Jul 26 19:55:25 PDT 2021


Hi, Haoyang

Maybe you should rebuild the MOFED with new kernel first, then rebuild
lustre server package.
1) about restore
I think you can try switch to the old kernel first, but as you said, you
have rebuild the MOFED under the new kernel, so once you go back to the old
kernel you need to rebuild MOFED(make sure the versions are the same) .
 If this not worked, you can try reinstall the IO servers as what you have
done at the very beginning, I recommand you  use a new drive to install OS.

2) about data loss
No data loss, they are stored in mgt&mdt&osts.

Thanks
Regards,

<lustre-discuss-request at lists.lustre.org> 于2021年7月27日周二 上午4:28写道:

> Send lustre-discuss mailing list submissions to
>         lustre-discuss at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
>         lustre-discuss-request at lists.lustre.org
>
> You can reach the person managing the list at
>         lustre-discuss-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>    1. Recover from broken lustre updates (Haoyang Liu)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 26 Jul 2021 16:28:26 +0800 (GMT+08:00)
> From: "Haoyang Liu" <liuhaoyang at pku.edu.cn>
> To: lustre-discuss at lists.lustre.org
> Subject: [lustre-discuss] Recover from broken lustre updates
> Message-ID: <5e70f6a4.db93.17ae1edf43b.Coremail.liuhaoyang at pku.edu.cn>
> Content-Type: text/plain; charset=UTF-8
>
> Hi all,
>
> I am using Lustre 2.7 along with mlnx infiniband. Recently I by mistake
> perform a system update and after the update the lustre modules won't load.
>
> System configuration before the update:
> centos-7.3, kernel version: 3.10.0-514.2.2.el7_lustre.gba8983e.x86_64
> lustre version:
> 2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64
> mlnx-ofed version:
> 4.2.1.2.0.1.gf8de107.kver.3.10.0_514.2.2.el7_lustre.gba8983e.x86_64.x86_64
>
> System configuration after the update:
> centos-7.3, kernel version: 3.10.0-514.2.2.el7_lustre.x86_64
> lustre version: 2.7.19.8-3.10.0_514.2.2.el7_lustre.x86_64.x86_64
> mlnx-ofed version:
> 4.2.1.2.0.1.gf8de107.kver.3.10.0_514.2.2.el7_lustre.gba8983e.x86_64.x86_64
>
> The update seems to just replace the linux kernel with a different patch
> version (w/o gba8983e),
> and rebuild the lustre modules (no upgrading for lustre). However, the
> lustre modules are built against the wrong version
> of mlnx-ofed. dmesg shows the following errors:
>
>
> [17509.744301] ko2iblnd: disagrees about version of symbol
> ib_fmr_pool_unmap
> [17509.744307] ko2iblnd: Unknown symbol ib_fmr_pool_unmap (err -22)
> [17509.744317] ko2iblnd: disagrees about version of symbol ib_create_cq
> [17509.744319] ko2iblnd: Unknown symbol ib_create_cq (err -22)
> [17509.744332] ko2iblnd: disagrees about version of symbol
> rdma_resolve_addr
> [17509.744334] ko2iblnd: Unknown symbol rdma_resolve_addr (err -22)
> [17509.744345] ko2iblnd: disagrees about version of symbol
> ib_create_fmr_pool
> ...
>
> I've tried to build mlnx-ofed under the updated kernel, but the problem
> still exists.
>
> My questions:
> 1) how to restore the lustre system before the updates? The following RPMs
> are already present on my server:
> ----------------
> kernel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> kernel-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> kernel-headers-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> kernel-tools-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> kernel-tools-libs-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> kernel-tools-libs-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> kmod-spl-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
> kmod-spl-devel-0.6.5.7-1.el7.x86_64.rpm
>
> kmod-spl-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
> kmod-zfs-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
> kmod-zfs-devel-0.6.5.7-1.el7.x86_64.rpm
>
> kmod-zfs-devel-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64-0.6.5.7-1.el7.x86_64.rpm
> libnvpair1-0.6.5.7-1.el7.x86_64.rpm
> libuutil1-0.6.5.7-1.el7.x86_64.rpm
> libzfs2-0.6.5.7-1.el7.x86_64.rpm
> libzfs2-devel-0.6.5.7-1.el7.x86_64.rpm
> libzpool2-0.6.5.7-1.el7.x86_64.rpm
>
> lustre-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
> lustre-dkms-2.7.19.8-1.el7.noarch.rpm
>
> lustre-iokit-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-modules-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-osd-ldiskfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-osd-ldiskfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-osd-zfs-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-osd-zfs-mount-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-source-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
>
> lustre-tests-2.7.19.8-3.10.0_514.2.2.el7_lustre.gba8983e.x86_64_gba8983e.x86_64.rpm
> mlnx-ofa_kernel-4.2-OFED.4.2.1.2.0.1.gf8de107.x86_64.rpm
> mlnx-ofa_kernel-devel-4.2-OFED.4.2.1.2.0.1.gf8de107.x86_64.rpm
>
> mlnx-ofa_kernel-modules-4.2-OFED.4.2.1.2.0.1.gf8de107.kver.3.10.0_514.2.2.el7_lustre.gba8983e.x86_64.x86_64.rpm
> perf-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> python-perf-3.10.0-514.2.2.el7_lustre.gba8983e.x86_64.rpm
> spl-0.6.5.7-1.el7.x86_64.rpm
> spl-dkms-0.6.5.7-1.el7.noarch.rpm
> zfs-0.6.5.7-1.el7.x86_64.rpm
> zfs-dkms-0.6.5.7-1.el7.noarch.rpm
> zfs-dracut-0.6.5.7-1.el7.x86_64.rpm
> zfs-test-0.6.5.7-1.el7.x86_64.rpm
> ----------------
>
> 2) What is the risk of my data loss?
>
>
> Thanks,
>
> Haoyang
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ------------------------------
>
> End of lustre-discuss Digest, Vol 184, Issue 17
> ***********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210727/6946a4f6/attachment.html>


More information about the lustre-discuss mailing list