[Lustre-discuss] Lustre can not be mounted issue

Changer Van changerv at gmail.com
Thu Jan 17 01:46:10 PST 2008


Hi all,

The Lustre FS has crashed after the entire system was rebooted.
Here are some error messages as follows:


On n01-ib0 (one of the clients)
-------------------------------

# /usr/sbin/lconf --node n01-ib0 /etc/lustre/config.xml
MDC: MDC_n01.local_mds_master2_MNT_n01-ib0_2
 23503_MNT_n01-ib0_2_a7896cb070 mds_master2_UUID
MDC: MDC_n01.local_mds_master2_MNT_n01-ib0_2
 23503_MNT_n01-ib0_2_a7896cb070
! /usr/sbin/lctl (255): IOC_PORTAL_DEL_UUID failed: Invalid argument

# dmesg
ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c)
 :ip_route_output_key(127.0.0.1) failed
... ...
LustreError: 5208:0:(client.c:947:ptlrpc_expire_one_request())
 @@@ timeout (sent at 1200501512, 5s ago)  req at 0000010117c34600
 x1/t0 o38->mds_master2_UUID at s03-ib0_UUID:12 lens 240/272 ref 1 fl
 Rpc:/0/0 rc 0/0
Lustre: Changing connection for MDC_n01.local_mds_master2_MNT_n01-ib0_2
 to s04-ib0_UUID/11.0.0.249 at vib
LustreError: 5208:0:(client.c:947:ptlrpc_expire_one_request())
 @@@ timeout (sent at 1200501517, 5s ago)  req at 0000010117c34600
 x3/t0 o38->mds_master2_UUID at s04-ib0_UUID:12 lens 240/272 ref 1 fl
 Rpc:/0/0 rc 0/0
Lustre: Changing connection for MDC_n01.local_mds_master2_MNT_n01-ib0_2
 to s03-ib0_UUID/11.0.0.250 at vib
... ...
Lustre: Skipped 39 previous similar messages
ERROR   : AD_TAVOR : vvi_mlx_poll_for_completion:(adaptor_tavor.c):VLT:
 completion_status: 10 (MLX: 12, syndrom: 129), total err num: 5
 (not print flush errors)
LustreError: 4941:0:(events.c:53:request_out_callback()) @@@ type 4, status
 -113  req at 000001011294f200 x2794/t0 o38->mds_master2_UUID at s03-ib0_UUID:12
 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/0
LustreError: 4941:0:(events.c:53:request_out_callback())
 Skipped 14 previous similar messages
LustreError: 23731:0:(obd_config.c:333:class_cleanup()) OBD
 MDC_n01.local_mds_master2_MNT_n01-ib0_2 is still busy with 5 references
You should stop active file system users, or use the --force option to
cleanup.
LustreError: 23731:0:(obd_config.c:234:class_detach()) OBD device 2 still
set up
LustreError: 23732:0:(lustre_peer.c:148:class_del_uuid()) delete
non-existent
 uuid s03-ib0_UUID


On s03-ib0 (failover MDS with s04-ib0)
--------------------------------------

# traceroute 11.0.0.1
traceroute to 11.0.0.1 (11.0.0.1), 30 hops max, 46 byte packets
 1  n01-ib0 (11.0.0.1)  0.149 ms  0.086 ms  0.088 ms

# dmesg
ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
 ip_route_output_key(127.0.0.1) failed
new: ipoib_allow_arp_joins: 1
Linux Kernel Card Services
  options:  [pci] [cardbus] [pm]
ERROR   : IPOIB_UD : ipoib_ud_find_dev_by_dst:(ipoib_ud_arp.c):
 ip_route_output_key(11.0.0.4) failed
... ...
Lustre: Added LNI 11.0.0.250 at vib [8/128]
Lustre: 4362:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
 12345-11.0.0.3 at vib portal 12 match 3734 offset 0 length 240: 2
Lustre: 4362:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
 12345-11.0.0.15 at vib portal 12 match 3736 offset 0 length 240: 2

The error messages like 'ip_route_output_key(*) failed' means
there is probably wrong routing IPOIB interface configuration.
But both IPOIB Interface configuration and node routing table
seem to be OK. Any help would be greatly appreciated.

-- 
Regards,
Changer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080117/0d34aedd/attachment.htm>


More information about the lustre-discuss mailing list