[Lustre-discuss] LustreError: 15f-b, initial OST mount fails with "Input/output error"

aberoham at gmail.com aberoham at gmail.com
Fri Mar 7 23:55:12 PST 2008


Following up on the below, the problem was a broken ethernet configuration.
One eth interface had an MTU of 1500 and the other was set to 9000 or
similar silliness, I think. Pings made it through the bond0 balance-alb
pair, but TCP/IP didn't? It was something along those lines.


On Thu, Feb 21, 2008 at 4:47 PM, <aberoham at gmail.com> wrote:

>
> I have a lustre MGS/MDT hosting one lustre filesystem with three OSTs
> attached. When trying to attach a forth OST I see the following on the OST's
> console and the mount command times out with "mount.lustre: mount
> /dev/lustre2/ost at /mnt/data/ost failed: Input/output error  Is the MGS
> running?".
>
> I am able to mount the lustre filesystem on this un-attachable OST node as
> a client and am able to ping the MGS/MDT and vice versa.
>
> # mkfs.lustre --reformat --fsname tmonster --ost --mgsnode=tm01 at tcp0--mkfsoptions='-N 1200000' /dev/lustre2/ost
> ...
> # mount -t lustre /dev/lustre2/ost /mnt/data/ost
> mount.lustre: mount /dev/lustre2/ost at /mnt/data/ost failed: Input/output
> error
> Is the MGS running?
>
> # dmesg
> Lustre: OBD class driver, info at clusterfs.com
>         Lustre Version: 1.6.4.2
>         Build Version:
> 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp
> Lustre: Added LNI 192.168.33.5 at tcp [8/256]
> Lustre: Accept secure, port 988
> Lustre: Lustre Client File System; info at clusterfs.com
> kjournald starting.  Commit interval 5 seconds
> LDISKFS FS on dm-2, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> LustreError: 2535:0:(obd_mount.c:247:ldd_parse()) cannot open
> CONFIGS/mountdata: rc = -2
> LustreError: 2535:0:(obd_mount.c:1252:server_kernel_mount()) premount
> parse options failed: rc = -2
> LustreError: 2535:0:(obd_mount.c:1533:server_fill_super()) Unable to mount
> device /dev/lustre2/ost: -2
> LustreError: 2535:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
> mount  (-2)
> NET: Registered protocol family 10
> lo: Disabled Privacy Extensions
> IPv6 over IPv4 tunneling driver
> bond0: no IPv6 routers present
> eth0: no IPv6 routers present
> eth1: no IPv6 routers present
> kjournald starting.  Commit interval 5 seconds
> LDISKFS FS on dm-2, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> kjournald starting.  Commit interval 5 seconds
> LDISKFS FS on dm-2, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> kjournald starting.  Commit interval 5 seconds
> LDISKFS FS on dm-2, internal journal
> LDISKFS-fs: mounted filesystem with ordered data mode.
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LustreError: 3157:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout
> (sent at 1203640178, 100s ago)  req at ffff81007f5db600 x2/t0 o253->
> MGS at MGC192.168.33.1@tcp_0:26 lens 4672/4672 ref 1 fl Rpc:/0/0 rc 0/-22
> LustreError: 166-1: MGC192.168.33.1 at tcp: Connection to service MGS via nid
> 192.168.33.1 at tcp was lost; in progress operations using this service will
> fail.
> LustreError: 3157:0:(obd_mount.c:954:server_register_target())
> registration with the MGS failed (-5)
> LustreError: 3157:0:(obd_mount.c:1054:server_start_targets()) Required
> registration failed for tmonster-OSTffff: -5
> LustreError: 15f-b: Communication error with the MGS.  Is the MGS running?
> Lustre: MGC192.168.33.1 at tcp: Reactivating import
> LustreError: 3157:0:(obd_mount.c:1570:server_fill_super()) Unable to start
> targets: -5
> Lustre: MGC192.168.33.1 at tcp: Connection restored to service MGS using nid
> 192.168.33.1 at tcp.
> LustreError: 3157:0:(obd_mount.c:1368:server_put_super()) no obd
> tmonster-OSTffff
> LustreError: 3157:0:(obd_mount.c:119:server_deregister_mount())
> tmonster-OSTffff not registered
> LustreError: 11-0: an error occurred while communicating with
> 192.168.33.1 at tcp. The mgs_disconnect operation failed with -107
> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks,
> 0 lost
> LDISKFS-fs: mballoc: 0 generated and it took 0
> LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> Lustre: server umount tmonster-OSTffff complete
> LustreError: 3157:0:(obd_mount.c:1924:lustre_fill_super()) Unable to
> mount  (-5)
>
>
> MGS/MDT dmesg: (some of these are certainly unrelated to the OST's mount
> cmd)
>
> Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect())
> tmonster-MDT0000: 3d7d98f5-470c-4188-8023-6c0023150148 reconnecting
> Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2
> previous similar messages
> Lustre: 2806:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks
> from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp
> Lustre: 2813:0:(service.c:751:ptlrpc_server_handle_reply()) All locks
> stolen from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp
> Lustre: 2797:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks
> from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp
> Lustre: 2817:0:(service.c:751:ptlrpc_server_handle_reply()) All locks
> stolen from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp
> Lustre: 2688:0:(router.c:167:lnet_notify()) Ignoring prediction from
> 192.168.33.1 at tcp of 192.168.33.5 at tcp down 7854805405 seconds in the future
> Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS:
> 8eba281a-43bd-3fa2-2491-fbab892dc02c reconnecting
> Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2
> previous similar messages
> Lustre: MGS: haven't heard from client
> 8eba281a-43bd-3fa2-2491-fbab892dc02c (at 192.168.33.5 at tcp) in 72 seconds.
> I think it's dead, and I am evicting it.
> LustreError: 2780:0:(mgs_handler.c:515:mgs_handle()) lustre_mgs: operation
> 251 on unconnected MGS
> LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@
> processing error (-107)  req at ffff810073150050 x7/t0 o251-><?>@<?>:-1 lens
> 128/0 ref 0 fl Interpret:/0/0 rc -107/0
> LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 2
> previous similar messages
>
>
> Mounting the desired lustre filesystem as client on the OST that is having
> problems --
>
> # mount -t lustre tm01 at tcp0:/tmonster     /mnt/tmonster
> # df -h /mnt/tmonster
> Filesystem            Size  Used Avail Use% Mounted on
> tm01 at tcp0:/tmonster   2.8T  181G  2.6T   7% /mnt/tmonster
>
> I have replaced this OSTs hardware (utilizing same boot/OST disks in
> different blade) to no avail. Any help is highly appreciated.
>
> Thanks,
> Abe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080307/4c4d6a21/attachment.htm>


More information about the lustre-discuss mailing list