[lustre-discuss] Lustre-2.10.5 problem

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Thu Sep 27 04:20:51 PDT 2018


Dear All,

To follow up my previous problems on Lustre-2.10.5. After spending
more days of test, finally I have solved it.

The solution is that my Linux kernel configuration should set:

CONFIG_DEBUG_FS=y

when compiling kernel (version 3.12.72). Without enabling this
option, Lustre-2.9 can work, but Lustre-2.10.5 cannot. Without
enabling this option, in Lustre-2.10.5 running "modprobe lustre"
it returns the error message:

ERROR: could not insert 'lustre': No such device

and dmesg just records:

[191843.804416] LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2
[191844.582597] Lustre: Lustre: Build Version: 2.10.5

I think Lustre-2.11.0 also has the same situation, since two weeks ago
I tried it, and I encountered exactly the same problem. For now I am
targeting Lustre-2.10.5, so I haven't gone back to try Lustre-2.11.0
again.

Sorry that in my previous Emails I have made wrong speculation about
my problem. It is nothing related to the version of kmod or udev.
I should apologize for making wrong statements before more careful
investigation.


Best Regards,

T.H.Hsieh


On Tue, Sep 25, 2018 at 06:00:24PM +0800, Tung-Han Hsieh wrote:
> Hello,
> 
> I just made another test. On my newer machine, I rebooted it with
> older kernel 3.12.72, and try to recompile Lustre again. So the
> system now becomes:
> 
> Linux OS Debian 9.5, with kmod version 23-2, udev version 232-25+deb9u4,
> linux kernel 3.12.72, gcc-4.9.2.
> 
> Then I compile Lustre-2.10.5 with
> 
>     ./configure --prefix=/opt/lustre \
>                 --with-linux=/path/to/linux-3.12.72 \
>                 --disable-server
> 
> This time I don't need to modify Lustre source code at all. And I can
> successfully run "modprobe lustre". 
> 
> So probably the error I encountered in the older system was due to kmod
> or udev version.
> 
> Could anyone confirm my speculation ?
> 
> Thanks very much.
> 
> 
> T.H.Hsieh
> 
> 
> On Tue, Sep 25, 2018 at 05:33:01PM +0800, Tung-Han Hsieh wrote:
> > Dear Andreas,
> > 
> > Thank you very much for your kindly reply.
> > 
> > When I run "modprobe lustre", dmesg only tells:
> > 
> > [191843.804416] LNet: HW NUMA nodes: 2, HW CPU cores: 24, npartitions: 2
> > [191844.582597] Lustre: Lustre: Build Version: 2.10.0
> > 
> > and I got "ERROR: could not insert 'lustre': No such device"
> > command line message. If I check "lsmod", I saw the following
> > lustre modules loaded:
> > 
> > Module                  Size  Used by
> > lnet                  388690  0 
> > libcfs                214791  1 lnet
> > 
> > When I run "modprobe obdclass", the result is exactly the same.
> > 
> > I also tried to recompile Lustre-2.10.5 with the options:
> > 
> >     ./configure --prefix=/opt/lustre \
> >                 --with-linux=/path/to/linux-3.12.72 \
> >                 --disable-server
> > 
> > to make the situation simpler. But I still get exactly the same error.
> > 
> > BTW., my Linux OS is Debian 8.10, with kmod version 18-3, udev
> > version 215-17+deb8u7, linux kernel 3.12.72, gcc-4.9.2.
> > 
> > ==========================================================================
> > 
> > Then I am wondering that whether this error is due to the version of
> > Linux OS ? So I tried to compile Lustre-2.10.5 again with the option:
> > 
> >     ./configure --prefix=/opt/lustre \
> >                 --with-linux=/path/to/linux-4.9.110 \
> >                 --disable-server
> > 
> > on a newer machine: Linux OS Debian 9.5, with kmod version 23-2,
> > udev version 232-25+deb9u4, linux kernel 4.9.110, gcc-4.9.2. I need
> > to comment out a few lines like:
> > 
> >     .setxattr       = ll_setxattr,
> >     .getxattr       = ll_getxattr,
> >     .listxattr      = ll_listxattr,
> >     .removexattr    = ll_removexattr,
> > 
> > in "lustre/llite/symlink.c", "lustre/llite/namei.c", and
> > "lustre/llite/file.c" in order to successfully build the lustre source
> > code. This time I can successfully run:
> > 
> > 	modprobe lustre
> > 
> > So, does it due to my Linux system (or utilities) too old ? Is there
> > a list of "System Requirements" to run Lustre-2.10.5 ?
> > 
> > ps. I suggest that the "System Requirements" should be documented in
> >     the release note of the Lustre software. Actually, everytime when
> >     I want to upgrade Lustre system in my clusters, I always have to
> >     spend a lot of time to *guess* the correct version combination of
> >     the system, the 3rd party libraries (e.g., ZFS), and Lustre itself, ....,
> >     etc to make everything work. Unfortunately all these information
> >     are not always easy to find.
> > 
> > 
> > Best Regards,
> > 
> > T.H.Hsieh
> > 
> > 
> > 
> > On Tue, Sep 25, 2018 at 07:38:00AM +0000, Andreas Dilger wrote:
> > > What does dmesg tell you?  Normally it will report some module has incorrect symbols, which means you compiled against a different version of the kernel source. OFED/MOFED libraries, etc.
> > > 
> > > > On Sep 25, 2018, at 05:14, Tung-Han Hsieh <thhsieh at twcp1.phys.ntu.edu.tw> wrote:
> > > > 
> > > > Dear All,
> > > > 
> > > > I found that my lustre-2.10.5 with ZFS (either 0.7.9 or 0.7.11)
> > > > cannot load the "lustre" modules because it cannot load the
> > > > "obdclass.ko" module. The error message is the following:
> > > > 
> > > > # modprobe -v -v obdclass
> > > > insmod /lib/modules/3.12.72/updates/fs/lustre/obdclass.ko
> > > > libkmod: INFO ../libkmod/libkmod-module.c:829 kmod_module_insert_module: Failed to insert module '/lib/modules/3.12.72/updates/fs/lustre/obdclass.ko': No such device
> > > > ERROR: could not insert 'obdclass': No such device
> > > > libkmod: INFO ../libkmod/libkmod.c:319 kmod_unref: context 0x7fb945d321e0 released
> > > > 
> > > > Could anyone suggest how to debug ?
> > > > 
> > > > Thanks very much.
> > > > 
> > > > 
> > > > T.H.Hsieh
> > > > 
> > > > 
> > > > On Tue, Sep 25, 2018 at 12:14:00AM +0800, Tung-Han Hsieh wrote:
> > > >> Dear Nathaniel,
> > > >> 
> > > >> Thank you very much for your kindly reply. Indeed I modified the
> > > >> lustre-2.10.5 codes:
> > > >> 
> > > >>    lustre/osd-zfs/osd_object.c
> > > >>    lustre/osd-zfs/osd_xattr.c
> > > >> 
> > > >> for the declaration:
> > > >> 
> > > >>    inode_timespec_t now;
> > > >> 
> > > >> Similar to what you have done in your patch. So I can compile
> > > >> lustre-2.10.5 cleanly with zfs-0.7.11. Sorry I forgot to mention.
> > > >> 
> > > >> But my problem is still there. Actually I just tried:
> > > >> 
> > > >> 1. Applying your patch to the original lustre-2.10.5 code, and
> > > >>   recompile with spl-0.7.11 and zfs-0.7.11. But loading "lustre"
> > > >>   module still gives "no such device" error.
> > > >> 
> > > >> 2. I recompile the original lustre-2.10.5 with spl-0.7.9 and
> > > >>   zfs-0.7.9. They can be compiled cleanly. But again I got the
> > > >>   "no such device" error when loading "lustre" module.
> > > >> 
> > > >> I am wondering that I must overlooked a trivial step, something
> > > >> like one (or some) of the utilities in /opt/lustre/sbin/* should
> > > >> be linked to /sbin/ or /usr/sbin/ ....
> > > >> 
> > > >> Any suggestions are very appreciated.
> > > >> 
> > > >> Thank you very much.
> > > >> 
> > > >> 
> > > >> T.H.Hsieh
> > > >> 
> > > >> 
> > > >> On Mon, Sep 24, 2018 at 01:21:19PM +0000, Nathaniel Clark wrote:
> > > >>> Hello Tung-Han,
> > > >>> 
> > > >>> ZFS 0.7.11 doesn’t compile cleanly with Lustre, yet.
> > > >>> 
> > > >>> There’s a ticket for adding ZFS 0.7.11 support to lustre:
> > > >>> https://jira.whamcloud.com/browse/LU-11393
> > > >>> 
> > > >>> It has patches for master (pre-2.12) and a separate patch for 2.10.
> > > >>> 
> > > >>> —
> > > >>> Nathaniel Clark <nclark at whamcloud.com<mailto:nclark at whamcloud.com>>
> > > >>> Senior Engineer
> > > >>> Whamcloud / DDN
> > > >>> 
> > > >>> On Sep 24, 2018, at 2:15 PM, Tung-Han Hsieh <thhsieh at twcp1.phys.ntu.edu.tw<mailto:thhsieh at twcp1.phys.ntu.edu.tw>> wrote:
> > > >>> 
> > > >>> Dear All,
> > > >>> 
> > > >>> I am trying to install Lustre version 2.10.5 with ZFS-0.7.11
> > > >>> from source code. After compilation and installation, I tried
> > > >>> to load the "lustre" module, but encountered the following
> > > >>> error:
> > > >>> 
> > > >>> # modprobe lustre
> > > >>> could not load module 'lustre': no such device
> > > >>> 
> > > >>> My procedure of installation is the following:
> > > >>> 
> > > >>> 1. Compile vanilla kernel 3.12.72 downloaded from:
> > > >>>  https://mirrors.edge.kernel.org/pub/linux/kernel/v3.x/linux-3.12.72.tar.gz
> > > >>> 
> > > >>> 2. Compile spl-0.7.11 downloaded from:
> > > >>>  https://github.com/zfsonlinux/zfs/releases/download/zfs-0.7.11/spl-0.7.11.tar.gz
> > > >>> 
> > > >>>  with the following steps:
> > > >>>  # ./configure --prefix=/opt/lustre --with-linux=/path/to/linux-3.12.72
> > > >>>  # make
> > > >>>  # make install
> > > >>> 
> > > >>> 3. Compile zfs-0.7.11 downloaded from:
> > > >>>  https://github.com/zfsonlinux/zfs/releases/download/zfs-0.7.11/zfs-0.7.11.tar.gz
> > > >>> 
> > > >>>  with the following steps:
> > > >>>  # ./configure --prefix=/opt/lustre \
> > > >>>                --with-linux=/path/to/linux-3.12.72 \
> > > >>>                --with-spl=/path/to/spl-0.7.11
> > > >>>  # make
> > > >>>  # make install
> > > >>> 
> > > >>> 4. Compile lustre downloaded from:
> > > >>>  https://downloads.whamcloud.com/public/lustre/lustre-2.10.5/sles12sp3/client/SRPMS/lustre-2.10.5-1.src.rpm
> > > >>> 
> > > >>>  Then I unpack the SRPM by the command:
> > > >>>  # rpm2cpio lustre-2.10.5-1.src.rpm | cpio --extract --make-directories
> > > >>> 
> > > >>>  and compile it by the following:
> > > >>>  # ./configure --prefix=/opt/lustre \
> > > >>>                --with-linux=/path/to/linux-3.12.72 \
> > > >>>                --with-spl=/path/to/spl-0.7.11 \
> > > >>>                --with-zfs=/path/to/zfs-0.7.11 \
> > > >>>                --with-o2ib=no \
> > > >>>                --disable-ldiskfs
> > > >>>  # make
> > > >>>  # make install
> > > >>> 
> > > >>> 5. I have made sure the following settings and utilities are correct:
> > > >>>  - PATH contains /opt/lustre/bin and /opt/lustre/sbin
> > > >>>  - /sbin/mount.lustre exists.
> > > >>>  - /sbin/mount.zfs exists.
> > > >>>  - /usr/sbin/l_getidentity exists.
> > > >>>  - /usr/sbin/ko2iblnd-probe exists.
> > > >>>  - /etc/modprobe.d/lustre.conf contains:
> > > >>>    options lnet networks=tcp
> > > >>>  - /etc/modprobe.d/ko2iblnd.conf contains:
> > > >>>    alias ko2iblnd-opa ko2iblnd
> > > >>>    options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
> > > >>>    install ko2iblnd /usr/sbin/ko2iblnd-probe
> > > >>> 
> > > >>> Then I tried to run "modprobe lustre", it says "no such device" error.
> > > >>> 
> > > >>> I tried to replace Lustre-2.10.5 by Lustre-2.9 downloaded from:
> > > >>> 
> > > >>> https://downloads.whamcloud.com/public/lustre/lustre-2.9.0/sles12sp1/client/SRPMS/lustre-2.9.0-1.src.rpm
> > > >>> 
> > > >>> and proceed exactly the same installation steps. Everything works fine.
> > > >>> 
> > > >>> Could anyone suggest me what have I missed for lustre-2.10.5 ? Or suggest
> > > >>> me how to debug.
> > > >>> 
> > > >>> Thanks very much.
> > > >>> 
> > > >>> 
> > > >>> T.H.Hsieh
> > > >>> _______________________________________________
> > > >>> lustre-discuss mailing list
> > > >>> lustre-discuss at lists.lustre.org
> > > >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > >>> 
> > > >> _______________________________________________
> > > >> lustre-discuss mailing list
> > > >> lustre-discuss at lists.lustre.org
> > > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > _______________________________________________
> > > > lustre-discuss mailing list
> > > > lustre-discuss at lists.lustre.org
> > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > 
> > > Cheers, Andreas
> > > ---
> > > Andreas Dilger
> > > CTO Whamcloud
> > > 
> > > 
> > > 
> > > 
> > 
> > 


More information about the lustre-discuss mailing list