[Lustre-discuss] Lustre 1.6.5.1 + kernel-ib doesn't work
Danny Sternkopf
dsternkopf at hpce.nec.com
Fri Aug 1 01:57:06 PDT 2008
Hi Brian,
we got the following messages when starting IB:
Jul 31 15:22:55 doss1 kernel: ib_mthca: Mellanox InfiniBand HCA driver
v1.0 (February 28, 2008)
Jul 31 15:22:55 doss1 kernel: ib_mthca: Initializing 0000:20:00.0
Jul 31 15:22:55 doss1 kernel: GSI 24 sharing vector 0x92 and IRQ 24
Jul 31 15:22:55 doss1 kernel: ACPI: PCI Interrupt 0000:20:00.0[A] -> GSI
24 (level, low) -> IRQ 146
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: HCA FW version
3.1.000 is old (3.5.000 is current).
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: If you have
problems, try updating your HCA FW.
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: NOP command failed
to generate interrupt (IRQ 170).
Jul 31 15:22:56 doss1 kernel: ib_mthca 0000:20:00.0: Trying again with
MSI/MSI-X disabled.
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ returned
status 0xff
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_MPT failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ returned
status 0xff
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_MPT failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ failed (-11)
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_EQ returned
status 0xff
Jul 31 15:23:56 doss1 kernel: ib_mthca 0000:20:00.0: HW2SW_MPT failed (-11)
So we updated the HCA FW and it resolved the problem. Now IB is working.
How about the 2nd issue?
http://lists.lustre.org/pipermail/lustre-discuss/2008-June/007767.html
Are there any news?
Thank you and Best regards,
Danny
Brian J. Murrell wrote:
> On Thu, 2008-07-31 at 16:08 +0200, Danny Sternkopf wrote:
>> Hi,
>>
>> installed all the new Lustre 1.6.5.1 packages on a CentOS5.1 system and
>> if I start OpenIB the server crashes. It also can't be rebooted anymore
>> until the kernel-ib RPM is deinstalled.
>
> That sounds very suspect.
>
>> Did anybody get it running?
>
> Most certainly our QA department had it all running before we released
> it.
>
> I suspect that you have some other problem masquerading itself as a
> problem with the OFED stack.
>
> I'm afraid there is not much we can do to help you without seeing some
> logs or error messages or the like. You might have to instrument your
> boot with some debugging to see where it's really getting stuck.
>
> b.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list