[Lustre-discuss] lustre ofed compatibility
Edward Walter
ewalter at cs.cmu.edu
Thu Jun 9 13:55:45 PDT 2011
Thanks for all of the advice here. We seem to be running into a hiccup
using Lustre 1.8.4 with O2IB and OFED 1.5.1
First of all, our lustre servers are all up and running fine (using the
vendor OFED - 1.4.1). Our trouble is all client side.
We want to use a newer OFED (1.5.1) to potentially enable NFS over RDMA
(we have NFS servers in addition to lustre).
We installed the current Lustre 1.8.4 rpms from Sun/Oracle:
> kernel-2.6.18-194.3.1.el5_lustre.1.8.4
> lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
> lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
>
> kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
> kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4
We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4.
Next we downloaded the OFED 1.5.1 sources and built the basic and hpc
packages. These built and installed without incident. I don't believe
Open Fabrics group provides binary RPMS. Otherwise; we would have used
them.
Here are the lustre/IB lines from our modprobe.conf:
> alias ib0 ib_ipoib
> alias net-pf-27 ib_sdp
> options lnet networks=o2ib
And our fstab:
> 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data /lustre
> lustre defaults,_netdev,localflock 0 0
OpenIB is working properly, we have a subnet manager running and can
ping our Lustre OSS and MDS servers over IB.
Trying to mount /lustre generates the following error:
> mount.lustre: mount 172.16.1.3 at o2ib:172.16.1.4 at o2ib:/data at /lustre
> failed: No such device
> Are the lustre modules loaded?
> Check /etc/modprobe.conf and /proc/filesystems
> Note 'alias lustre llite' should be removed from modprobe.conf
dmesg shows that the ko2iblnd module cannot be loaded:
> Lustre: OBD class driver, http://www.lustre.org/
> Lustre: Lustre Version: 1.8.4
> Lustre: Build Version:
> 1.8.4-20100723170646-PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
> ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
> ko2iblnd: Unknown symbol ib_fmr_pool_unmap
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq
> ko2iblnd: disagrees about version of symbol rdma_resolve_addr
> ko2iblnd: Unknown symbol rdma_resolve_addr
> ko2iblnd: disagrees about version of symbol ib_reg_phys_mr
> ko2iblnd: Unknown symbol ib_reg_phys_mr
> ko2iblnd: disagrees about version of symbol ib_create_fmr_pool
> ko2iblnd: Unknown symbol ib_create_fmr_pool
> ko2iblnd: disagrees about version of symbol ib_dereg_mr
> ko2iblnd: Unknown symbol ib_dereg_mr
> ko2iblnd: disagrees about version of symbol rdma_reject
> ko2iblnd: Unknown symbol rdma_reject
> ko2iblnd: disagrees about version of symbol rdma_disconnect
> ko2iblnd: Unknown symbol rdma_disconnect
> ko2iblnd: disagrees about version of symbol rdma_resolve_route
> ko2iblnd: Unknown symbol rdma_resolve_route
> ko2iblnd: disagrees about version of symbol rdma_bind_addr
> ko2iblnd: Unknown symbol rdma_bind_addr
> ko2iblnd: disagrees about version of symbol rdma_create_qp
> ko2iblnd: Unknown symbol rdma_create_qp
> ko2iblnd: disagrees about version of symbol ib_destroy_cq
> ko2iblnd: Unknown symbol ib_destroy_cq
> ko2iblnd: disagrees about version of symbol rdma_create_id
> ko2iblnd: Unknown symbol rdma_create_id
> ko2iblnd: disagrees about version of symbol rdma_listen
> ko2iblnd: Unknown symbol rdma_listen
> ko2iblnd: disagrees about version of symbol rdma_destroy_qp
> ko2iblnd: Unknown symbol rdma_destroy_qp
> ko2iblnd: disagrees about version of symbol ib_query_device
> ko2iblnd: Unknown symbol ib_query_device
> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> ko2iblnd: Unknown symbol ib_get_dma_mr
> ko2iblnd: disagrees about version of symbol ib_alloc_pd
> ko2iblnd: Unknown symbol ib_alloc_pd
> ko2iblnd: disagrees about version of symbol rdma_connect
> ko2iblnd: Unknown symbol rdma_connect
> ko2iblnd: disagrees about version of symbol ib_modify_qp
> ko2iblnd: Unknown symbol ib_modify_qp
> ko2iblnd: disagrees about version of symbol rdma_destroy_id
> ko2iblnd: Unknown symbol rdma_destroy_id
> ko2iblnd: disagrees about version of symbol rdma_accept
> ko2iblnd: Unknown symbol rdma_accept
> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> ko2iblnd: Unknown symbol ib_dealloc_pd
> ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
> ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
> LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis()) Can't load
> LND o2ib, module ko2iblnd, rc=256
> LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network
> initialisation failed
Am I missing something obvious here.
Thanks much.
-Ed
On 06/05/2011 05:48 AM, Wu, Yilei wrote:
> we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400 node
> Cluster, on basis of RHEL 5.4. It is no problem at all.
>
> One thing need attention:
>
> If using default OFED 1.5.1, just install with RPM package, no need to
> build either Lustre or OFED.
>
> If using revised driver, such as BX-OFED 1.5.1, in some cases, users
> need to recompile linux kernel with increased stack size, because
> lustre and ofed may use up stack (both are stack greedy) and thus lead
> to system hang issue.
>
> YiLei
>
>
> On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren
> <kevin.van.maren at oracle.com <mailto:kevin.van.maren at oracle.com>> wrote:
>
> OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe more
> people are using the in-kernel OFED now: Lustre (finally) defaulted to
> the in-kernel OFED for RedHat, so it is no longer _necessary_ to build
> either OFED or Lustre.
>
> Kevin
>
>
> Edward Walter wrote:
> > Hi List,
> >
> > We're getting ready to upgrade the OS/software stack on one of our
> > clusters and I'm looking at which Lustre and OFED versions will
> work best.
> >
> > It looks like the changelog for 1.8.4 and the compatibility
> matrix have
> > conflicting information.
> >
> > The Lustre compatibility matrix indicates that on Lustre 1.8.4; the
> > highest OFED revision with o2iblnd support is 1.4.2:
> > http://wiki.lustre.org/index.php/Lustre_Release_Information
> >
> > The changelog for 1.8.4 indicates that o2iblnd is supported with
> OFED 1.5.1:
> >
> http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4
> >
> >
> > Can someone clarify whether 1.8.4 supports o2iblnd with OFED
> 1.5.1? Are
> > there any pitfalls to this configuration? Has anyone found any
> > instabilities with this configuration?
> >
> > Thanks much.
> >
> > -Ed Walter
> > Carnegie Mellon University
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> <mailto:Lustre-discuss at lists.lustre.org>
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> <mailto:Lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110609/363dd32f/attachment.htm>
More information about the lustre-discuss
mailing list