[Lustre-discuss] RHEL5's OFED with lustre1.8.2 on IB

Lawrence Sorrillo sorrillo at jlab.org
Thu Apr 8 08:43:10 PDT 2010


Brian:

I greatly appreciate your input. These IB connections for this set of 
builds are SDR when the rest of the fabric is either DDR or QDR. We have 
one large fabric.
It appears that only these nodes with this build(and SDR connections ) 
are affected this way. I guess I can place a DDR card with a different 
cable and IB port and see
 if this makes a difference. All the machines built this way are 
experiencing the hangs so I assumed it was not hardware. Although it 
could be just hardware-they-all-share.

I can't find the pre-built, kernel-2.6.18-164.6.1.0.1.el5.x86_84.rpm. I 
only found the source (kernel-2.6.18-164.6.1.0.1.el5.src.rpm).
Hence the reason I need to build the binary version. Do you have it 
somewhere? I can't use the lustre patched version as I have other 
software to
install that expects a stock kernel version. I am hoping to use the 
pre-built lustre-client rpms with my built binary(hoping for no modules 
versioning complaints).

Overly hopeful?

~Lawrence


Brian J. Murrell wrote:
> On Thu, 2010-04-08 at 10:56 -0400, Lawrence Sorrillo wrote: 
>   
>> I am about to try to build lustre again as I am getting hangs with the 
>> lustre mounts in my previous build.
>>
>> "Apr 7 09:09:30 host0 kernel: LustreError: 
>> 5270:0:(o2iblnd_cb.c:2883:kiblnd_check_txs()) Timed out tx: active_txs, 
>> 9 seconds
>> Apr 7 09:09:30 host0 kernel: LustreError: 
>> 5270:0:(o2iblnd_cb.c:2945:kiblnd_check_conns()) Timed out RDMA with 
>> 172.17.1.108 at o2ib (84)"
>>     
>
> What makes you think that this is a software problem and that rebuilding
> the software stack will resolve it?  FWIW, every time I have seen this
> type of problem reported, the fabric was flaky.
>
>   
>> Here is the plan. Lustre 1.8.2 on rhel5 x86_64 using the ofed in the rhel5 kernel.
>>     
>
> In case it's not what you mean, why don't you just use the pre-built
> packages that we have built and extensively tested in our QA department
> for you?
>
>   
>> I have gathered the following packages from the lustre site:
>> e2fsprogs-1.41.6.sun1-0redhat.rhel5.x86_64.rpm
>> kernel-2.6.18-164.6.1.0.1.el5.src.rpm
>>     
>
> Why do you need a kernel src.rpm?
>
>   
>> lustre-client-1.8.2-2.6.18_164.6.1.0.1.el5_lustre.1.8.2.x86_64.rpm
>> lustre-client-modules-1.8.2-2.6.18_164.6.1.0.1.el5_lustre.1.8.2.x86_64.rpm
>>
>> I want to get the kernel-2.6.18-164.6.1.0.1.el5.x86_64.rpm binary from 
>> kernel-2.6.18-164.6.1.0.1.el5.src.rpm.
>>     
>
> Why not just use the binary kernel we provide instead of rebuilding your
> own?  It's the *exact* same kernel that we used in our QA testing and
> therefore a known quantity.
>
> b.
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   





More information about the lustre-discuss mailing list