[Lustre-discuss] Lustre 1.6.5.1 + kernel-ib doesn't work

Danny Sternkopf dsternkopf at hpce.nec.com
Fri Aug 1 01:57:06 PDT 2008


Hi Brian,

we got the following messages when starting IB:

Jul 31 15:22:55 doss1 kernel: ib_mthca: Mellanox InfiniBand HCA driver 
v1.0 (February 28, 2008)
Jul 31 15:22:55 doss1 kernel: ib_mthca: Initializing 0000:20:00.0
Jul 31 15:22:55 doss1 kernel: GSI 24 sharing vector 0x92 and IRQ 24
Jul 31 15:22:55 doss1 kernel: ACPI: PCI Interrupt 0000:20:00.0[A] -> GSI 
24 (level, low) -> IRQ 146
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: HCA FW version 
3.1.000 is old (3.5.000 is current).
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: If you have 
problems, try updating your HCA FW.
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: NOP command failed 
to generate interrupt (IRQ 170).
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: Trying again with 
MSI/MSI-X disabled.
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ returned 
status 0xff
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_MPT failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ returned 
status 0xff
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_MPT failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ returned 
status 0xff
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_MPT failed (-11)

So we updated the HCA FW and it resolved the problem. Now IB is working.

How about the 2nd issue? 
http://lists.lustre.org/pipermail/lustre-discuss/2008-June/007767.html

Are there any news?

Thank you and Best regards,

Danny

Brian J. Murrell wrote:
> On Thu, 2008-07-31 at 16:08 +0200, Danny Sternkopf wrote:
>> Hi,
>>
>> installed all the new Lustre 1.6.5.1 packages on a CentOS5.1 system and 
>> if I start OpenIB the server crashes. It also can't be rebooted anymore 
>> until the kernel-ib RPM is deinstalled.
> 
> That sounds very suspect.
> 
>> Did anybody get it running?
> 
> Most certainly our QA department had it all running before we released
> it.
> 
> I suspect that you have some other problem masquerading itself as a
> problem with the OFED stack.
> 
> I'm afraid there is not much we can do to help you without seeing some
> logs or error messages or the like.  You might have to instrument your
> boot with some debugging to see where it's really getting stuck.
> 
> b.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list