[Lustre-discuss] LustreError: 15f-b, initial OST mount fails with "Input/output error"

aberoham at gmail.com aberoham at gmail.com
Thu Feb 21 16:47:57 PST 2008


I have a lustre MGS/MDT hosting one lustre filesystem with three OSTs
attached. When trying to attach a forth OST I see the following on the OST's
console and the mount command times out with "mount.lustre: mount
/dev/lustre2/ost at /mnt/data/ost failed: Input/output error  Is the MGS
running?".

I am able to mount the lustre filesystem on this un-attachable OST node as a
client and am able to ping the MGS/MDT and vice versa.

# mkfs.lustre --reformat --fsname tmonster --ost
--mgsnode=tm01 at tcp0--mkfsoptions='-N 1200000' /dev/lustre2/ost
...
# mount -t lustre /dev/lustre2/ost /mnt/data/ost
mount.lustre: mount /dev/lustre2/ost at /mnt/data/ost failed: Input/output
error
Is the MGS running?

# dmesg
Lustre: OBD class driver, info at clusterfs.com
        Lustre Version: 1.6.4.2
        Build Version:
1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp
Lustre: Added LNI 192.168.33.5 at tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info at clusterfs.com
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on dm-2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LustreError: 2535:0:(obd_mount.c:247:ldd_parse()) cannot open
CONFIGS/mountdata: rc = -2
LustreError: 2535:0:(obd_mount.c:1252:server_kernel_mount()) premount parse
options failed: rc = -2
LustreError: 2535:0:(obd_mount.c:1533:server_fill_super()) Unable to mount
device /dev/lustre2/ost: -2
LustreError: 2535:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount
(-2)
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
bond0: no IPv6 routers present
eth0: no IPv6 routers present
eth1: no IPv6 routers present
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on dm-2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on dm-2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
LDISKFS FS on dm-2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
LustreError: 3157:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout
(sent at 1203640178, 100s ago)  req at ffff81007f5db600 x2/t0 o253->
MGS at MGC192.168.33.1@tcp_0:26 lens 4672/4672 ref 1 fl Rpc:/0/0 rc 0/-22
LustreError: 166-1: MGC192.168.33.1 at tcp: Connection to service MGS via nid
192.168.33.1 at tcp was lost; in progress operations using this service will
fail.
LustreError: 3157:0:(obd_mount.c:954:server_register_target()) registration
with the MGS failed (-5)
LustreError: 3157:0:(obd_mount.c:1054:server_start_targets()) Required
registration failed for tmonster-OSTffff: -5
LustreError: 15f-b: Communication error with the MGS.  Is the MGS running?
Lustre: MGC192.168.33.1 at tcp: Reactivating import
LustreError: 3157:0:(obd_mount.c:1570:server_fill_super()) Unable to start
targets: -5
Lustre: MGC192.168.33.1 at tcp: Connection restored to service MGS using nid
192.168.33.1 at tcp.
LustreError: 3157:0:(obd_mount.c:1368:server_put_super()) no obd
tmonster-OSTffff
LustreError: 3157:0:(obd_mount.c:119:server_deregister_mount())
tmonster-OSTffff not registered
LustreError: 11-0: an error occurred while communicating with
192.168.33.1 at tcp. The mgs_disconnect operation failed with -107
LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0
lost
LDISKFS-fs: mballoc: 0 generated and it took 0
LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
Lustre: server umount tmonster-OSTffff complete
LustreError: 3157:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount
(-5)


MGS/MDT dmesg: (some of these are certainly unrelated to the OST's mount
cmd)

Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) tmonster-MDT0000:
3d7d98f5-470c-4188-8023-6c0023150148 reconnecting
Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2 previous
similar messages
Lustre: 2806:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks from
rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp
Lustre: 2813:0:(service.c:751:ptlrpc_server_handle_reply()) All locks stolen
from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp
Lustre: 2797:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks from
rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp
Lustre: 2817:0:(service.c:751:ptlrpc_server_handle_reply()) All locks stolen
from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp
Lustre: 2688:0:(router.c:167:lnet_notify()) Ignoring prediction from
192.168.33.1 at tcp of 192.168.33.5 at tcp down 7854805405 seconds in the future
Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS:
8eba281a-43bd-3fa2-2491-fbab892dc02c reconnecting
Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2 previous
similar messages
Lustre: MGS: haven't heard from client 8eba281a-43bd-3fa2-2491-fbab892dc02c
(at 192.168.33.5 at tcp) in 72 seconds. I think it's dead, and I am evicting
it.
LustreError: 2780:0:(mgs_handler.c:515:mgs_handle()) lustre_mgs: operation
251 on unconnected MGS
LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing
error (-107)  req at ffff810073150050 x7/t0 o251-><?>@<?>:-1 lens 128/0 ref 0
fl Interpret:/0/0 rc -107/0
LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 2
previous similar messages


Mounting the desired lustre filesystem as client on the OST that is having
problems --

# mount -t lustre tm01 at tcp0:/tmonster     /mnt/tmonster
# df -h /mnt/tmonster
Filesystem            Size  Used Avail Use% Mounted on
tm01 at tcp0:/tmonster   2.8T  181G  2.6T   7% /mnt/tmonster

I have replaced this OSTs hardware (utilizing same boot/OST disks in
different blade) to no avail. Any help is highly appreciated.

Thanks,
Abe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080221/87bce441/attachment.htm>


More information about the lustre-discuss mailing list