[Lustre-discuss] lustre ofed compatibility

Kevin Van Maren Kevin.Van.Maren at oracle.com
Thu Jun 9 16:38:14 PDT 2011


You must rebuild Lustre if you replace OFED.

Kevin


On Jun 9, 2011, at 4:55 PM, Edward Walter <ewalter at cs.cmu.edu> wrote:

> Thanks for all of the advice here.  We seem to be running into a  
> hiccup using Lustre 1.8.4 with O2IB and OFED 1.5.1
>
> First of all, our lustre servers are all up and running fine (using  
> the vendor OFED - 1.4.1). Our trouble is all client side.
>
> We want to use a newer OFED (1.5.1) to potentially enable NFS  
> over     RDMA (we have NFS servers in addition to lustre).
>
> We installed the current Lustre 1.8.4 rpms from Sun/Oracle:
>> kernel-2.6.18-194.3.1.el5_lustre.1.8.4
>> lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
>> lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
>>
>> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
>> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4
>
> We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4.
>
> Next we downloaded the OFED 1.5.1 sources and built the basic and  
> hpc packages.  These built and installed without incident.  I don't  
> believe Open Fabrics group provides binary RPMS.  Otherwise; we  
> would have used them.
>
> Here are the lustre/IB lines from our modprobe.conf:
>> alias ib0 ib_ipoib
>> alias net-pf-27 ib_sdp
>> options lnet networks=o2ib
>
> And our fstab:
>> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data          / 
>> lustre                 lustre  defaults,_netdev,localflock 0 0
>
> OpenIB is working properly, we have a subnet manager running and can  
> ping our Lustre OSS and MDS servers over IB.
>
> Trying to mount /lustre generates the following error:
>> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at / 
>> lustre failed: No such device
>> Are the lustre modules loaded?
>> Check /etc/modprobe.conf and /proc/filesystems
>> Note 'alias lustre llite' should be removed from modprobe.conf
>
> dmesg shows that the ko2iblnd module cannot be loaded:
>> Lustre: OBD class driver, http://www.lustre.org/
>> Lustre:     Lustre Version: 1.8.4
>> Lustre:     Build Version: 1.8.4-20100723170646- 
>> PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
>> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
>> ko2iblnd: Unknown symbol ib_fmr_pool_unmap
>> ko2iblnd: disagrees about version of symbol ib_create_cq
>> ko2iblnd: Unknown symbol ib_create_cq
>> ko2iblnd: disagrees about version of symbol rdma_resolve_addr
>> ko2iblnd: Unknown symbol rdma_resolve_addr
>> ko2iblnd: disagrees about version of symbol ib_reg_phys_mr
>> ko2iblnd: Unknown symbol ib_reg_phys_mr
>> ko2iblnd: disagrees about version of symbol ib_create_fmr_pool
>> ko2iblnd: Unknown symbol ib_create_fmr_pool
>> ko2iblnd: disagrees about version of symbol ib_dereg_mr
>> ko2iblnd: Unknown symbol ib_dereg_mr
>> ko2iblnd: disagrees about version of symbol rdma_reject
>> ko2iblnd: Unknown symbol rdma_reject
>> ko2iblnd: disagrees about version of symbol rdma_disconnect
>> ko2iblnd: Unknown symbol rdma_disconnect
>> ko2iblnd: disagrees about version of symbol rdma_resolve_route
>> ko2iblnd: Unknown symbol rdma_resolve_route
>> ko2iblnd: disagrees about version of symbol rdma_bind_addr
>> ko2iblnd: Unknown symbol rdma_bind_addr
>> ko2iblnd: disagrees about version of symbol rdma_create_qp
>> ko2iblnd: Unknown symbol rdma_create_qp
>> ko2iblnd: disagrees about version of symbol ib_destroy_cq
>> ko2iblnd: Unknown symbol ib_destroy_cq
>> ko2iblnd: disagrees about version of symbol rdma_create_id
>> ko2iblnd: Unknown symbol rdma_create_id
>> ko2iblnd: disagrees about version of symbol rdma_listen
>> ko2iblnd: Unknown symbol rdma_listen
>> ko2iblnd: disagrees about version of symbol rdma_destroy_qp
>> ko2iblnd: Unknown symbol rdma_destroy_qp
>> ko2iblnd: disagrees about version of symbol ib_query_device
>> ko2iblnd: Unknown symbol ib_query_device
>> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
>> ko2iblnd: Unknown symbol ib_get_dma_mr
>> ko2iblnd: disagrees about version of symbol ib_alloc_pd
>> ko2iblnd: Unknown symbol ib_alloc_pd
>> ko2iblnd: disagrees about version of symbol rdma_connect
>> ko2iblnd: Unknown symbol rdma_connect
>> ko2iblnd: disagrees about version of symbol ib_modify_qp
>> ko2iblnd: Unknown symbol ib_modify_qp
>> ko2iblnd: disagrees about version of symbol rdma_destroy_id
>> ko2iblnd: Unknown symbol rdma_destroy_id
>> ko2iblnd: disagrees about version of symbol rdma_accept
>> ko2iblnd: Unknown symbol rdma_accept
>> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
>> ko2iblnd: Unknown symbol ib_dealloc_pd
>> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
>> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
>> LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis())  
>> Can't       load LND o2ib, module ko2iblnd, rc=256
>> LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network  
>> initialisation failed
>
> Am I missing something obvious here.
>
> Thanks much.
>
> -Ed
>
> On 06/05/2011 05:48 AM, Wu, Yilei wrote:
>>
>> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400  
>> node Cluster, on basis of RHEL 5.4. It is no problem at all.
>>
>> One thing need attention:
>>
>> If using default OFED 1.5.1, just install with RPM package, no need  
>> to build either Lustre or OFED.
>>
>> If using revised driver, such as BX-OFED 1.5.1, in some cases,  
>> users need to recompile linux kernel with increased stack size,  
>> because lustre and ofed may use up stack (both are stack greedy)  
>> and thus lead to system hang issue.
>>
>> YiLei
>>
>>
>> On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren <kevin.van.maren at oracle.com 
>> > wrote:
>> OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe  
>> more
>> people are using the in-kernel OFED now: Lustre (finally) defaulted  
>> to
>> the in-kernel OFED for RedHat, so it is no longer _necessary_ to  
>> build
>> either OFED or Lustre.
>>
>> Kevin
>>
>>
>> Edward Walter wrote:
>> > Hi List,
>> >
>> > We're getting ready to upgrade the OS/software  stack on one of our
>> > clusters and I'm looking at which Lustre and OFED versions will  
>> work best.
>> >
>> > It looks like the changelog for 1.8.4 and the compatibility  
>> matrix have
>> > conflicting information.
>> >
>> > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the
>> > highest OFED revision with o2iblnd support is 1.4.2:
>> > http://wiki.lustre.org/index.php/Lustre_Release_Information
>> >
>> > The changelog for 1.8.4 indicates that o2iblnd is supported with  
>> OFED 1.5.1:
>> > http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4
>> >
>> >
>> > Can someone clarify whether 1.8.4 supports o2iblnd with OFED  
>> 1.5.1?  Are
>> > there any pitfalls to this configuration?  Has anyone found any
>> > instabilities with this configuration?
>> >
>> > Thanks much.
>> >
>> > -Ed Walter
>> > Carnegie Mellon University
>> > _______________________________________________
>> > Lustre-discuss mailing list
>> > Lustre-discuss at lists.lustre.org
>> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> >
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110609/11b05438/attachment.htm>


More information about the lustre-discuss mailing list