[Lustre-discuss] Is OFED 'kernel-ib' required for o2ib on RHEL5?

Lawrence Sorrillo sorrillo at jlab.org
Tue Mar 23 07:53:06 PDT 2010


Ken:

 This is a wonderful post. Very very helpful indeed. Thank you.

I also noticed something in trying to compile lustre. The 
"Module.symvers" file is very much needed but is only created after you do
"make/make modules" in the /usr/src/linux directory. It does NOT exist 
before then and the lustre installation guide  makes it seem
like this file should exist when you unpack the sources. But it does 
not. Hence you cannot go on to run configure on lustre and expect the 
modules to work.

~Lawrence



Ken Hornstein wrote:
>> You're right, I had problems with the module symbol versions using
>> Lustre 1.8.2 packages available at Sun website, kernel
>> 2.6.18-164.11.1.el5 (RHEL 5.4) and OFED 1.5. The same problems happens
>> when using OFED 1.4.2.
>>     
>
> So since this comes up now and then, I've cc'd the list.
>
> So you can Google around to find more about kernel symbol versioning.
> The short answer is that there is a CRC associated with each exported
> symbol in the loaded kernel, and that version is recorded in the module
> when it is compiled.  That's all well and good, but figuring out what
> happens when it doesn't work is a pain, because all of the information
> isn't in one place (and nobody has explained it well, at least that
> I've seen).
>
> When a module (like Lustre) is compiled, it's pointed at a file called
> "Module.symvers"; that contains the versions of the symbols that
> modules are expected to link against, and those versions are recorded
> in the module object file.  When you get this mismatch at module load
> time, one of two things is happening: the "wrong" OFed is being loaded,
> or you linked against the "wrong" Module.symvers file.
>
> How do you figure out which one is the problem?  Well, let's take a
> common OFed symbol, like rdma_connect.  You can find out the version of
> this symbol by grep'ing /proc/kallsyms.  On our system:
>
> # grep rdma_connect /proc/kallsyms 
> ffffffffa0375510 u rdma_connect [ko2iblnd]
> ffffffffa0375510 u rdma_connect [rdma_ucm]
> ffffffffa0375510 u rdma_connect [ib_sdp]
> ffffffffa0377000 r __ksymtab_rdma_connect       [rdma_cm]
> ffffffffa0377225 r __kstrtab_rdma_connect       [rdma_cm]
> ffffffffa03770f0 r __kcrctab_rdma_connect       [rdma_cm]
> 000000000ef3a1e8 a __crc_rdma_connect   [rdma_cm]
> ffffffffa0375510 T rdma_connect [rdma_cm]
>
> The symbol you care about is the absolute symbol, the one prefixed by
> __crc.  So in this case, we are interested in __crc_rdma_connect, and
> that symbol's version is 0x0ef3a1ea.  This is the symbol used by the
> currently running kernel.
>
> Which version is Lustre linked against?  Well, for that you need to
> find the ko2iblnd.ko file, and dump the __versions section.
>
> # objdump -s -j __versions ko2iblnd.ko | less
> [...]
> 0670 00000000 00000000 00000000 00000000  ................
> 0680 e8a1f30e 00000000 72646d61 5f636f6e  ........rdma_con
> 0690 6e656374 00000000 00000000 00000000  nect............
> 06a0 00000000 00000000 00000000 00000000  ................
>
> This display isn't as pretty, but you want to look in the hex dump
> just before the symbol name.  In this case, right before rmda_connect,
> you will see "e8a1f30e" ... which is the little-endian version of our
> symbol version!  So they match up, and everything works.
>
> If you want to find out which symbol version is in a particular OFed module
> (in this case, we want to look at rdma_cm.ko), you can do this:
>
> # nm ./kernel/drivers/infiniband/core/rdma_cm.ko | grep rdma_connect
> 00000000cd7aa3e6 A __crc_rdma_connect
>
> Wrong version!  But we're ACTUALLY using the module located here:
>
> nm ./updates/kernel/drivers/infiniband/core/rdma_cm.ko | grep rdma_connect
> 000000000ef3a1e8 A __crc_rdma_connect
>
> Which is the "correct" version.  But if you LINK against the first
> version, you'll get these errors when you try to load Lustre.  Note
> that my Module.symvers file for this kernel contains:
>
> 0xcd7aa3e6      rdma_connect    drivers/infiniband/core/rdma_cm EXPORT_SYMBOL
>
> Which is wrong!  In this case, you need to explicitly point Lustre at
> the OFed directory which contains the Module.symvers file.
>
> (Can you tell I've beaten my head against the wall over this issue
> a WHOLE LOT? :-/)
>
> --Ken
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   





More information about the lustre-discuss mailing list