[Lustre-discuss] RHEL5's OFED with lustre1.8.2 on IB

Brian J. Murrell Brian.Murrell at Sun.COM
Thu Apr 8 08:19:25 PDT 2010


On Thu, 2010-04-08 at 10:56 -0400, Lawrence Sorrillo wrote: 
> I am about to try to build lustre again as I am getting hangs with the 
> lustre mounts in my previous build.
> 
> "Apr 7 09:09:30 host0 kernel: LustreError: 
> 5270:0:(o2iblnd_cb.c:2883:kiblnd_check_txs()) Timed out tx: active_txs, 
> 9 seconds
> Apr 7 09:09:30 host0 kernel: LustreError: 
> 5270:0:(o2iblnd_cb.c:2945:kiblnd_check_conns()) Timed out RDMA with 
> 172.17.1.108 at o2ib (84)"

What makes you think that this is a software problem and that rebuilding
the software stack will resolve it?  FWIW, every time I have seen this
type of problem reported, the fabric was flaky.

> Here is the plan. Lustre 1.8.2 on rhel5 x86_64 using the ofed in the rhel5 kernel.

In case it's not what you mean, why don't you just use the pre-built
packages that we have built and extensively tested in our QA department
for you?

> I have gathered the following packages from the lustre site:
> e2fsprogs-1.41.6.sun1-0redhat.rhel5.x86_64.rpm
> kernel-2.6.18-164.6.1.0.1.el5.src.rpm

Why do you need a kernel src.rpm?

> lustre-client-1.8.2-2.6.18_164.6.1.0.1.el5_lustre.1.8.2.x86_64.rpm
> lustre-client-modules-1.8.2-2.6.18_164.6.1.0.1.el5_lustre.1.8.2.x86_64.rpm
> 
> I want to get the kernel-2.6.18-164.6.1.0.1.el5.x86_64.rpm binary from 
> kernel-2.6.18-164.6.1.0.1.el5.src.rpm.

Why not just use the binary kernel we provide instead of rebuilding your
own?  It's the *exact* same kernel that we used in our QA testing and
therefore a known quantity.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100408/382ab9e3/attachment.pgp>


More information about the lustre-discuss mailing list