[Lustre-discuss] lustre can not mounted problem

Aaron Knister aaron at iges.org
Wed Jan 9 06:55:55 PST 2008


I don't know if the voltaire IB stack is the same as OFED but I'm  
guessing it has a subnet manager. Check that. I've had similar issues  
when my subnet manager has crashed.

On Jan 9, 2008, at 3:08 AM, Changer Van wrote:

> Network connection is down. I can not ping the other nodes.
> I ran the vstat command and found one of the port_state is  
> 'port_initialize'.
> What does 'port_initialize' mean? Dose it mean my ib card is broken?
>
> 1 HCA found:
>         hca_id=InfiniHost_III_Ex0
>         pci_location={BUS=0x20,DEV/FUNC=0x00}
>         vendor_id=0x02C9
>         vendor_part_id=0x6282
>         hw_ver=0xA0
>         fw_ver=5.1.400
>         PSID=MT_0140000001
>         num_phys_ports=2
>                 port=1
>                 port_state=PORT_INITIALIZE
>                 sm_lid=0x0000
>                 port_lid=0x0000
>                 port_lmc=0x00
>                 max_mtu=2048
>                 port=2
>                 port_state=PORT_DOWN
>                 sm_lid=0x0000
>                 port_lid=0x0000
>                 port_lmc=0x00
>                 max_mtu=2048
> -- 
> Regards,
> Changer
>
> On Jan 9, 2008 3:27 AM, Klaus Steden <klaus.steden at thomson.net> wrote:
>
> If you're using IPoIB, you can use standard TCP/IP diagnostic tools  
> the same way you would on an Ethernet link (ifconfig, ping,  
> traceroute, telnet, etc.)
>
> If you're using a copper-to-optical converter in your data path as  
> well, the Emcore MIAs have link lights on them which will tell you  
> if a physical link is present (check the documentation). I know with  
> STP InfiniBand connectors, there is some ambiguity about terminology  
> with some vendors and manufacturers, and the fibre arrangement  
> doesn't provide a lot of wiggle room.
>
> Klaus
>
> On 1/7/08 7:56 PM, "Changer Van" <changerv at gmail.com>did etch on  
> stone tablets:
>
>
>
> On Jan 8, 2008 1:35 AM, Isaac Huang <He.Huang at sun.com> wrote:
> On Mon, Jan 07, 2008 at 06:20:52PM +0800, Changer Van wrote:
> >    ......
> >    # dmesg
> >
> >    LustreError: 4273:0:(viblnd.c :1890:kibnal_startup())
> >
> >             Can't find an active port on InfiniHost_III_Ex0
>
> It meant that viblnd couldn't find a port whose link state was active
> on the hca InfiniHost_III_Ex0, i.e . no link on the device was usable.
>
> Was there any other error messages from viblnd before this one?
> There was no error messages but a related message
> like 'ADDRCONF(NETDEV_UP):ipoib0: link is not ready'.
> Did you see this problem on just one node?
> There are four nodes which can not mount the lustre system.
> The other nodes can mount the lustre but got the following error  
> messages:
>
> # dmesg
> divert: not allocating divert_blk for non-ethernet device ipoib0
> ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
>      ip_route_output_key(127.0.0.1 <http://127.0.0.1> ) failed
>
> new: ipoib_allow_arp_joins: 1
> ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
>      ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed
>
> ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
>      ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed
>
> ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
>      ip_route_output_key(11.0.0.4 <http://11.0.0.4> ) failed
>
>
> How can I check the link on the device? Thanks in advance.
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080109/4ef3577f/attachment.htm>


More information about the lustre-discuss mailing list