[lustre-discuss] Corrupted? MDT not mounting

Andrew Elwell andrew.elwell at gmail.com
Sun May 8 17:54:21 PDT 2022


On Fri, 6 May 2022 at 20:04, Andreas Dilger <adilger at whamcloud.com> wrote:
> MOFED is usually preferred over in-kernel OFED, it is just tested and fixed a lot more.

Fair enough, However is the 2.12.8-ib tree built with all the features?
specifically https://downloads.whamcloud.com/public/lustre/lustre-2.12.8-ib/MOFED-4.9-4.1.7.0/el7/server/

If I compare the ib_srp module from 2.12 in-kernel

[root at astrofs-oss3 ~]# find /lib/modules/`uname -r` -name ib_srp.ko.xz
/lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz
[root at astrofs-oss3 ~]# rpm -qf
/lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz
kernel-3.10.0-1160.49.1.el7_lustre.x86_64
[root at astrofs-oss3 ~]# modinfo ib_srp
filename:
/lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz
license:        Dual BSD/GPL
description:    InfiniBand SCSI RDMA Protocol initiator
author:         Roland Dreier
retpoline:      Y
rhelversion:    7.9
srcversion:     1FB80E3A962EE7F39AD3959
depends:        ib_core,scsi_transport_srp,ib_cm,rdma_cm
intree:         Y
vermagic:       3.10.0-1160.49.1.el7_lustre.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        FA:A3:27:4B:D9:17:36:F0:FD:43:6A:42:1B:6A:A4:FA:FE:D0:AC:FA
sig_hashalgo:   sha256
parm:           srp_sg_tablesize:Deprecated name for cmd_sg_entries (uint)
parm:           cmd_sg_entries:Default number of gather/scatter
entries in the SRP command (default is 12, max 255) (uint)
parm:           indirect_sg_entries:Default max number of
gather/scatter entries (default is 12, max is 2048) (uint)
parm:           allow_ext_sg:Default behavior when there are more than
cmd_sg_entries S/G entries after mapping; fails the request when false
(default false) (bool)
parm:           topspin_workarounds:Enable workarounds for
Topspin/Cisco SRP target bugs if != 0 (int)
parm:           prefer_fr:Whether to use fast registration if both FMR
and fast registration are supported (bool)
parm:           register_always:Use memory registration even for
contiguous memory regions (bool)
parm:           never_register:Never register memory (bool)
parm:           reconnect_delay:Time between successive reconnect attempts
parm:           fast_io_fail_tmo:Number of seconds between the
observation of a transport layer error and failing all I/O. "off"
means that this functionality is disabled.
parm:           dev_loss_tmo:Maximum number of seconds that the SRP
transport should insulate transport layer errors. After this time has
been exceeded the SCSI host is removed. Should be between 1 and
SCSI_DEVICE_BLOCK_MAX_TIMEOUT if fast_io_fail_tmo has not been set.
"off" means that this functionality is disabled.
parm:           ch_count:Number of RDMA channels to use for
communication with an SRP target. Using more than one channel improves
performance if the HCA supports multiple completion vectors. The
default value is the minimum of four times the number of online CPU
sockets and the number of completion vectors supported by the HCA.
(uint)
parm:           use_blk_mq:Use blk-mq for SRP (bool)
[root at astrofs-oss3 ~]#

.. it all looks normal and capable of mounting our exascaler luns

cf the one from 2.12.8-ib

=============================================================================================================================================================================================
 Package                                            Arch
              Version
Repository                                   Size
=============================================================================================================================================================================================
Installing:
 kernel                                             x86_64
              3.10.0-1160.49.1.el7_lustre
lustre-2.12-mofed                            50 M
 kmod-lustre-osd-ldiskfs                            x86_64
              2.12.8_6_g5457c37-1.el7
lustre-2.12-mofed                           469 k
 lustre                                             x86_64
              2.12.8_6_g5457c37-1.el7
lustre-2.12-mofed                           805 k
Installing for dependencies:
 kmod-lustre                                        x86_64
              2.12.8_6_g5457c37-1.el7
lustre-2.12-mofed                           3.9 M
 kmod-mlnx-ofa_kernel                               x86_64
              4.9-OFED.4.9.4.1.7.1
lustre-2.12-mofed                           1.3 M
 lustre-osd-ldiskfs-mount                           x86_64
              2.12.8_6_g5457c37-1.el7
lustre-2.12-mofed                            15 k
 mlnx-ofa_kernel                                    x86_64
              4.9-OFED.4.9.4.1.7.1
lustre-2.12-mofed                           108 k

[root at astrofs-oss1 ~]# find /lib/modules/`uname -r` -name ib_srp.ko.xz
/lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz
[root at astrofs-oss1 ~]# rpm -qf
/lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/kernel/drivers/infiniband/ulp/srp/ib_srp.ko.xz
kernel-3.10.0-1160.49.1.el7_lustre.x86_64
[root at astrofs-oss1 ~]# modinfo ib_srp
filename:
/lib/modules/3.10.0-1160.49.1.el7_lustre.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/ulp/srp/ib_srp.ko
version:        4.9-4.1.7
license:        Dual BSD/GPL
description:    ib_srp dummy kernel module
author:         Alaa Hleihel
retpoline:      Y
rhelversion:    7.9
srcversion:     9ACAA2F5216D9D9FC379EC8
depends:        mlx_compat
vermagic:       3.10.0-1160.49.1.el7_lustre.x86_64 SMP mod_unload modversions
[root at astrofs-oss1 ~]#


which doesn't seem actually be able to take any of the normal ib_srp parameters:

[root at astrofs-oss1 ~]# modprobe ib_srp
modprobe: ERROR: could not insert 'ib_srp': Unknown symbol in module,
or unknown parameter (see dmesg)

[  238.194931] ib_srp: Unknown parameter `cmd_sg_entries'

etc

Any suggestions? I quickly tried installing another mlnx-ofa_kernel
(from http://downloads.linux.hpe.com/SDR/repo/mlnx_ofed/RHEL/7.9/x86_64/4.9-4.1.7.0/)
but the same.dummy module


More information about the lustre-discuss mailing list