[Lustre-discuss] lustre ofed compatibility

Edward Walter ewalter at cs.cmu.edu
Thu Jun 9 15:36:25 PDT 2011


We rebooted after installing the Lustre rpms so that we would be sure 
that OFED built against the running kernel.  We've also rebooted a 
couple of times since then and tried to manually load modules.

Here's the output from /etc/infiniband/info:
> prefix=/opt/ofed
> Kernel=2.6.18-194.3.1.el5_lustre.1.8.4
>
> Configure options: --with-core-mod --with-user_mad-mod 
> --with-user_access-mod --with-addr_trans-mod --with-mthca-mod 
> --with-mlx4-mod --with-mlx4_en-mod --with-cxgb3-mod --with-nes-mod 
> --with-ipoib-mod

and 'uname -r'
> 2.6.18-194.3.1.el5_lustre.1.8.4

The kernel-ib version looks correct too:
> # rpm -qa |grep kernel-ib
> kernel-ib-1.5.1-2.6.18_194.3.1.el5_lustre.1.8.4

Doing a manual modprobe on lustre also fails:
> # modprobe lustre
> WARNING: Error inserting osc 
> (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/osc.ko): 
> Unknown symbol in module, or unknown parameter (see dmesg)
> WARNING: Error inserting mdc 
> (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/mdc.ko): 
> Unknown symbol in module, or unknown parameter (see dmesg)
> WARNING: Error inserting lov 
> (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/lov.ko): 
> Unknown symbol in module, or unknown parameter (see dmesg)
> FATAL: Error inserting lustre 
> (/lib/modules/2.6.18-194.3.1.el5_lustre.1.8.4/updates/kernel/fs/lustre/lustre.ko): 
> Unknown symbol in module, or unknown parameter (see dmesg)

As far as symbol versions are concerned; aren't these all defined in the 
kernel-headers and kernel-devel packages?  The versions we're using 
match our Lustre kernel version:
> # rpm -qa |grep kernel-devel
> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
> # rpm -qa |grep kernel-headers
> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4

Thanks again.

-Ed


On 06/09/2011 05:53 PM, Hebenstreit, Michael wrote:
> are you sure you did a reboot after installing the mdoules? otherwise 
> this looks like a build error where outdated symbols were used
> Michael
>
> ------------------------------------------------------------------------
> *From:* lustre-discuss-bounces at lists.lustre.org 
> [mailto:lustre-discuss-bounces at lists.lustre.org] *On Behalf Of *Edward 
> Walter
> *Sent:* Thursday, June 09, 2011 1:56 PM
> *To:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [Lustre-discuss] lustre ofed compatibility
>
> Thanks for all of the advice here.  We seem to be running into a 
> hiccup using Lustre 1.8.4 with O2IB and OFED 1.5.1
>
> First of all, our lustre servers are all up and running fine (using 
> the vendor OFED - 1.4.1). Our trouble is all client side.
>
> We want to use a newer OFED (1.5.1) to potentially enable NFS over 
> RDMA (we have NFS servers in addition to lustre).
>
> We installed the current Lustre 1.8.4 rpms from Sun/Oracle:
>> kernel-2.6.18-194.3.1.el5_lustre.1.8.4
>> lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
>> lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
>>
>> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
>> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4
>
> We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4.
>
> Next we downloaded the OFED 1.5.1 sources and built the basic and hpc 
> packages.  These built and installed without incident.  I don't 
> believe Open Fabrics group provides binary RPMS.  Otherwise; we would 
> have used them.
>
> Here are the lustre/IB lines from our modprobe.conf:
>> alias ib0 ib_ipoib
>> alias net-pf-27 ib_sdp
>> options lnet networks=o2ib
>
> And our fstab:
>> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data          
>> /lustre                 lustre  defaults,_netdev,localflock 0 0
>
> OpenIB is working properly, we have a subnet manager running and can 
> ping our Lustre OSS and MDS servers over IB.
>
> Trying to mount /lustre generates the following error:
>> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at /lustre 
>> failed: No such device
>> Are the lustre modules loaded?
>> Check /etc/modprobe.conf and /proc/filesystems
>> Note 'alias lustre llite' should be removed from modprobe.conf
>
> dmesg shows that the ko2iblnd module cannot be loaded:
>> Lustre: OBD class driver, http://www.lustre.org/
>> Lustre:     Lustre Version: 1.8.4
>> Lustre:     Build Version: 
>> 1.8.4-20100723170646-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
>> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
>> ko2iblnd: Unknown symbol ib_fmr_pool_unmap
>> ko2iblnd: disagrees about version of symbol ib_create_cq
>> ko2iblnd: Unknown symbol ib_create_cq
>> ko2iblnd: disagrees about version of symbol rdma_resolve_addr
>> ko2iblnd: Unknown symbol rdma_resolve_addr
>> ko2iblnd: disagrees about version of symbol ib_reg_phys_mr
>> ko2iblnd: Unknown symbol ib_reg_phys_mr
>> ko2iblnd: disagrees about version of symbol ib_create_fmr_pool
>> ko2iblnd: Unknown symbol ib_create_fmr_pool
>> ko2iblnd: disagrees about version of symbol ib_dereg_mr
>> ko2iblnd: Unknown symbol ib_dereg_mr
>> ko2iblnd: disagrees about version of symbol rdma_reject
>> ko2iblnd: Unknown symbol rdma_reject
>> ko2iblnd: disagrees about version of symbol rdma_disconnect
>> ko2iblnd: Unknown symbol rdma_disconnect
>> ko2iblnd: disagrees about version of symbol rdma_resolve_route
>> ko2iblnd: Unknown symbol rdma_resolve_route
>> ko2iblnd: disagrees about version of symbol rdma_bind_addr
>> ko2iblnd: Unknown symbol rdma_bind_addr
>> ko2iblnd: disagrees about version of symbol rdma_create_qp
>> ko2iblnd: Unknown symbol rdma_create_qp
>> ko2iblnd: disagrees about version of symbol ib_destroy_cq
>> ko2iblnd: Unknown symbol ib_destroy_cq
>> ko2iblnd: disagrees about version of symbol rdma_create_id
>> ko2iblnd: Unknown symbol rdma_create_id
>> ko2iblnd: disagrees about version of symbol rdma_listen
>> ko2iblnd: Unknown symbol rdma_listen
>> ko2iblnd: disagrees about version of symbol rdma_destroy_qp
>> ko2iblnd: Unknown symbol rdma_destroy_qp
>> ko2iblnd: disagrees about version of symbol ib_query_device
>> ko2iblnd: Unknown symbol ib_query_device
>> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
>> ko2iblnd: Unknown symbol ib_get_dma_mr
>> ko2iblnd: disagrees about version of symbol ib_alloc_pd
>> ko2iblnd: Unknown symbol ib_alloc_pd
>> ko2iblnd: disagrees about version of symbol rdma_connect
>> ko2iblnd: Unknown symbol rdma_connect
>> ko2iblnd: disagrees about version of symbol ib_modify_qp
>> ko2iblnd: Unknown symbol ib_modify_qp
>> ko2iblnd: disagrees about version of symbol rdma_destroy_id
>> ko2iblnd: Unknown symbol rdma_destroy_id
>> ko2iblnd: disagrees about version of symbol rdma_accept
>> ko2iblnd: Unknown symbol rdma_accept
>> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
>> ko2iblnd: Unknown symbol ib_dealloc_pd
>> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
>> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
>> LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis()) Can't load 
>> LND o2ib, module ko2iblnd, rc=256
>> LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network 
>> initialisation failed
>
> Am I missing something obvious here.
>
> Thanks much.
>
> -Ed
>
> On 06/05/2011 05:48 AM, Wu, Yilei wrote:
>> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 node 
>> Cluster, on basis of RHEL 5.4. It is no problem at all.
>>
>> One thing need attention:
>>
>> If using default OFED 1.5.1, just install with RPM package, no need 
>> to build either Lustre or OFED.
>>
>> If using revised driver, such as BX-OFED 1.5.1, in some cases, users 
>> need to recompile linux kernel with increased stack size, because 
>> lustre and ofed may use up stack (both are stack greedy) and thus 
>> lead to system hang issue.
>>
>> YiLei
>>
>>
>> On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren 
>> <kevin.van.maren at oracle.com <mailto:kevin.van.maren at oracle.com>> wrote:
>>
>>     OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe
>>     more
>>     people are using the in-kernel OFED now: Lustre (finally)
>>     defaulted to
>>     the in-kernel OFED for RedHat, so it is no longer _necessary_ to
>>     build
>>     either OFED or Lustre.
>>
>>     Kevin
>>
>>
>>     Edward Walter wrote:
>>     > Hi List,
>>     >
>>     > We're getting ready to upgrade the OS/software  stack on one of our
>>     > clusters and I'm looking at which Lustre and OFED versions will
>>     work best.
>>     >
>>     > It looks like the changelog for 1.8.4 and the compatibility
>>     matrix have
>>     > conflicting information.
>>     >
>>     > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the
>>     > highest OFED revision with o2iblnd support is 1.4.2:
>>     > http://wiki.lustre.org/index.php/Lustre_Release_Information
>>     >
>>     > The changelog for 1.8.4 indicates that o2iblnd is supported
>>     with OFED 1.5.1:
>>     >
>>     http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4
>>     >
>>     >
>>     > Can someone clarify whether 1.8.4 supports o2iblnd with OFED
>>     1.5.1?  Are
>>     > there any pitfalls to this configuration?  Has anyone found any
>>     > instabilities with this configuration?
>>     >
>>     > Thanks much.
>>     >
>>     > -Ed Walter
>>     > Carnegie Mellon University
>>     > _______________________________________________
>>     > Lustre-discuss mailing list
>>     > Lustre-discuss at lists.lustre.org
>>     <mailto:Lustre-discuss at lists.lustre.org>
>>     > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>     >
>>
>>     _______________________________________________
>>     Lustre-discuss mailing list
>>     Lustre-discuss at lists.lustre.org
>>     <mailto:Lustre-discuss at lists.lustre.org>
>>     http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110609/90331aed/attachment.htm>


More information about the lustre-discuss mailing list