[Lustre-discuss] lustre ofed compatibility

Edward Walter ewalter at cs.cmu.edu
Thu Jun 9 13:55:45 PDT 2011


Thanks for all of the advice here.  We seem to be running into a hiccup 
using Lustre 1.8.4 with O2IB and OFED 1.5.1

First of all, our lustre servers are all up and running fine (using the 
vendor OFED - 1.4.1). Our trouble is all client side.

We want to use a newer OFED (1.5.1) to potentially enable NFS over RDMA 
(we have NFS servers in addition to lustre).

We installed the current Lustre 1.8.4 rpms from Sun/Oracle:
> kernel-2.6.18-194.3.1.el5_lustre.1.8.4
> lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
> lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
>
> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4

We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4.

Next we downloaded the OFED 1.5.1 sources and built the basic and hpc 
packages.  These built and installed without incident.  I don't believe 
Open Fabrics group provides binary RPMS.  Otherwise; we would have used 
them.

Here are the lustre/IB lines from our modprobe.conf:
> alias ib0 ib_ipoib
> alias net-pf-27 ib_sdp
> options lnet networks=o2ib

And our fstab:
> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data          /lustre                 
> lustre  defaults,_netdev,localflock 0 0

OpenIB is working properly, we have a subnet manager running and can 
ping our Lustre OSS and MDS servers over IB.

Trying to mount /lustre generates the following error:
> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at /lustre 
> failed: No such device
> Are the lustre modules loaded?
> Check /etc/modprobe.conf and /proc/filesystems
> Note 'alias lustre llite' should be removed from modprobe.conf

dmesg shows that the ko2iblnd module cannot be loaded:
> Lustre: OBD class driver, http://www.lustre.org/
> Lustre:     Lustre Version: 1.8.4
> Lustre:     Build Version: 
> 1.8.4-20100723170646-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
> ko2iblnd: Unknown symbol ib_fmr_pool_unmap
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq
> ko2iblnd: disagrees about version of symbol rdma_resolve_addr
> ko2iblnd: Unknown symbol rdma_resolve_addr
> ko2iblnd: disagrees about version of symbol ib_reg_phys_mr
> ko2iblnd: Unknown symbol ib_reg_phys_mr
> ko2iblnd: disagrees about version of symbol ib_create_fmr_pool
> ko2iblnd: Unknown symbol ib_create_fmr_pool
> ko2iblnd: disagrees about version of symbol ib_dereg_mr
> ko2iblnd: Unknown symbol ib_dereg_mr
> ko2iblnd: disagrees about version of symbol rdma_reject
> ko2iblnd: Unknown symbol rdma_reject
> ko2iblnd: disagrees about version of symbol rdma_disconnect
> ko2iblnd: Unknown symbol rdma_disconnect
> ko2iblnd: disagrees about version of symbol rdma_resolve_route
> ko2iblnd: Unknown symbol rdma_resolve_route
> ko2iblnd: disagrees about version of symbol rdma_bind_addr
> ko2iblnd: Unknown symbol rdma_bind_addr
> ko2iblnd: disagrees about version of symbol rdma_create_qp
> ko2iblnd: Unknown symbol rdma_create_qp
> ko2iblnd: disagrees about version of symbol ib_destroy_cq
> ko2iblnd: Unknown symbol ib_destroy_cq
> ko2iblnd: disagrees about version of symbol rdma_create_id
> ko2iblnd: Unknown symbol rdma_create_id
> ko2iblnd: disagrees about version of symbol rdma_listen
> ko2iblnd: Unknown symbol rdma_listen
> ko2iblnd: disagrees about version of symbol rdma_destroy_qp
> ko2iblnd: Unknown symbol rdma_destroy_qp
> ko2iblnd: disagrees about version of symbol ib_query_device
> ko2iblnd: Unknown symbol ib_query_device
> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> ko2iblnd: Unknown symbol ib_get_dma_mr
> ko2iblnd: disagrees about version of symbol ib_alloc_pd
> ko2iblnd: Unknown symbol ib_alloc_pd
> ko2iblnd: disagrees about version of symbol rdma_connect
> ko2iblnd: Unknown symbol rdma_connect
> ko2iblnd: disagrees about version of symbol ib_modify_qp
> ko2iblnd: Unknown symbol ib_modify_qp
> ko2iblnd: disagrees about version of symbol rdma_destroy_id
> ko2iblnd: Unknown symbol rdma_destroy_id
> ko2iblnd: disagrees about version of symbol rdma_accept
> ko2iblnd: Unknown symbol rdma_accept
> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> ko2iblnd: Unknown symbol ib_dealloc_pd
> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
> LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis()) Can't load 
> LND o2ib, module ko2iblnd, rc=256
> LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network 
> initialisation failed

Am I missing something obvious here.

Thanks much.

-Ed

On 06/05/2011 05:48 AM, Wu, Yilei wrote:
> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 node 
> Cluster, on basis of RHEL 5.4. It is no problem at all.
>
> One thing need attention:
>
> If using default OFED 1.5.1, just install with RPM package, no need to 
> build either Lustre or OFED.
>
> If using revised driver, such as BX-OFED 1.5.1, in some cases, users 
> need to recompile linux kernel with increased stack size, because 
> lustre and ofed may use up stack (both are stack greedy) and thus lead 
> to system hang issue.
>
> YiLei
>
>
> On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren 
> <kevin.van.maren at oracle.com <mailto:kevin.van.maren at oracle.com>> wrote:
>
>     OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe more
>     people are using the in-kernel OFED now: Lustre (finally) defaulted to
>     the in-kernel OFED for RedHat, so it is no longer _necessary_ to build
>     either OFED or Lustre.
>
>     Kevin
>
>
>     Edward Walter wrote:
>     > Hi List,
>     >
>     > We're getting ready to upgrade the OS/software  stack on one of our
>     > clusters and I'm looking at which Lustre and OFED versions will
>     work best.
>     >
>     > It looks like the changelog for 1.8.4 and the compatibility
>     matrix have
>     > conflicting information.
>     >
>     > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the
>     > highest OFED revision with o2iblnd support is 1.4.2:
>     > http://wiki.lustre.org/index.php/Lustre_Release_Information
>     >
>     > The changelog for 1.8.4 indicates that o2iblnd is supported with
>     OFED 1.5.1:
>     >
>     http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4
>     >
>     >
>     > Can someone clarify whether 1.8.4 supports o2iblnd with OFED
>     1.5.1?  Are
>     > there any pitfalls to this configuration?  Has anyone found any
>     > instabilities with this configuration?
>     >
>     > Thanks much.
>     >
>     > -Ed Walter
>     > Carnegie Mellon University
>     > _______________________________________________
>     > Lustre-discuss mailing list
>     > Lustre-discuss at lists.lustre.org
>     <mailto:Lustre-discuss at lists.lustre.org>
>     > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>     >
>
>     _______________________________________________
>     Lustre-discuss mailing list
>     Lustre-discuss at lists.lustre.org
>     <mailto:Lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110609/363dd32f/attachment.htm>


More information about the lustre-discuss mailing list