[lustre-discuss] Lustre 2.5.3 - OST unable to connect to MGS

Murshid Azman murshid.azman at gmail.com
Wed Oct 7 23:43:26 PDT 2015


Hello Lustre gurus,

Recently, one of our OSS' had a faulty RAID card (3ware) and this has
corrupted the root filesystem and Lustre OST.

We then reinstalled the OS, fsck'd Lustre OST using a backup superblock
(the primary one was corrupted) and recreated the journal (journal also
corrupted). We now have a bunch of files in lost+found, evidently by
mounting as ldiskfs.

However, we are having problems mounting the Lustre OST with errors as
follows:

Oct  7 13:01:45 OSS50 kernel: LDISKFS-fs (sdb): mounted filesystem with
ordered data mode. quota=off. Opts:
Oct  7 13:01:48 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not
available for connect from 172.16.4.66 at tcp (no target). If you are running
an HA pair check that the target is mounted on the other server.
Oct  7 13:01:48 OSS50 kernel: LustreError: Skipped 5 previous similar
messages
Oct  7 13:01:48 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not
available for connect from 172.16.250.59 at tcp (no target). If you are
running an HA pair check that the target is mounted on the other server.
Oct  7 13:01:48 OSS50 kernel: LustreError: Skipped 3 previous similar
messages
Oct  7 13:01:51 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not
available for connect from 172.16.7.199 at tcp (no target). If you are running
an HA pair check that the target is mounted on the other server.
Oct  7 13:01:51 OSS50 kernel: LustreError: Skipped 15 previous similar
messages
Oct  7 13:01:55 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not
available for connect from 172.16.250.173 at tcp (no target). If you are
running an HA pair check that the target is mounted on the other server.
Oct  7 13:01:55 OSS50 kernel: LustreError: Skipped 19 previous similar
messages
Oct  7 13:02:04 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not
available for connect from 172.16.5.114 at tcp (no target). If you are running
an HA pair check that the target is mounted on the other server.
Oct  7 13:02:04 OSS50 kernel: LustreError: Skipped 49 previous similar
messages
Oct  7 13:02:04 OSS50 kernel: LustreError: 0-0: Trying to start OBD
Lustre-OST003b_UUID using the wrong disk <85>. Were the /dev/ assignments
rearranged?
Oct  7 13:02:04 OSS50 kernel: LustreError:
16002:0:(obd_config.c:572:class_setup()) setup Lustre-OST003b failed (-22)
Oct  7 13:02:04 OSS50 kernel: LustreError:
16002:0:(obd_config.c:1591:class_config_llog_handler()) MGC172.16.0.251 at tcp:
cfg command failed: rc = -22
Oct  7 13:02:04 OSS50 kernel: Lustre:    cmd=cf003 0:Lustre-OST003b  1:dev
2:0  3:f
Oct  7 13:02:04 OSS50 kernel: LustreError: 15b-f: MGC172.16.0.251 at tcp: The
configuration from log 'Lustre-OST003b'failed from the MGS (-22).  Make
sure this client and the MGS are running compatible versions of Lustre.
Oct  7 13:02:05 OSS50 kernel: LustreError: 15c-8: MGC172.16.0.251 at tcp: The
configuration from log 'Lustre-OST003b' failed (-22). This may be the
result of communication errors between this node and the MGS, a bad
configuration, or other errors. See the syslog for more information.
Oct  7 13:02:05 OSS50 kernel: LustreError:
15976:0:(obd_mount_server.c:1252:server_start_targets()) failed to start
server Lustre-OST003b: -22
Oct  7 13:02:05 OSS50 kernel: LustreError:
15976:0:(obd_mount_server.c:1735:server_fill_super()) Unable to start
targets: -22
Oct  7 13:02:05 OSS50 kernel: Lustre: Lustre-OST003b: Not available for
connect from 172.16.5.116 at tcp (not set up)
Oct  7 13:02:05 OSS50 kernel: LustreError:
15976:0:(obd_mount_server.c:845:lustre_disconnect_lwp())
Lustre-MDT0000-lwp-OST003b: Can't end config log Lustre-client.
Oct  7 13:02:05 OSS50 kernel: LustreError:
15976:0:(obd_mount_server.c:1420:server_put_super()) Lustre-OST003b: failed
to disconnect lwp. (rc=-2)
Oct  7 13:02:05 OSS50 kernel: LustreError:
15976:0:(obd_config.c:619:class_cleanup()) Device 135 not setup
Oct  7 13:02:05 OSS50 kernel: Lustre: server umount Lustre-OST003b complete
Oct  7 13:02:05 OSS50 kernel: LustreError:
15976:0:(obd_mount.c:1324:lustre_fill_super()) Unable to mount /dev/sdb
(-22)
Oct  7 13:02:05 OSS50 kernel: Lustre: Skipped 1 previous similar message

Any ideas?

I would think that we can eliminate the configuration errors by doing a
writeconf but since this is a potentially destructive operation, I'd like
to check with you experts see if anyone have experienced something like
this?

Thank you,
Murshid.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20151008/32faeac4/attachment.htm>


More information about the lustre-discuss mailing list