[lustre-discuss] Cannot mount mdt or osts after upgrade

Shane Nehring snehring at iastate.edu
Wed Jun 27 15:32:13 PDT 2018


Hello all,

I've been unable to mount the mdt or osts for a volume since upgrading
to 2.10.4 yesterday (previously 2.10.2).

on the mdt I'm getting:
mount.lustre: mount store/metadata-store at /mnt/metadata-store failed:
File exists
and in the kernel:
Lustre: MGS: Logs for fs newwork were removed by user request.  All
servers must be restarted in order to regenerate the logs: rc = 0
Lustre: newwork-MDT0000: Imperative Recovery enabled, recovery window
shrunk from 300-900 down to 150-900
LustreError: 31955:0:(mdt_handler.c:6167:mdt_iocontrol())
newwork-MDT0000: Aborting recovery for device
LustreError: 31955:0:(obd_mount_server.c:1879:server_fill_super())
Unable to start targets: -17
Lustre: Failing over newwork-MDT0000
Lustre: server umount newwork-MDT0000 complete
LustreError: 31955:0:(obd_mount.c:1582:lustre_fill_super()) Unable to
mount  (-17)


on the osts I'm getting:

mount.lustre: mount store/ost at /lustre/ost1 failed: Operation already
in progress
The target service is already running. (store/ost)

kernel:
Mounting /lustre/ost1...
Starting SYSV: Part of the lustre file system....
Started SYSV: Part of the lustre file system..
Lustre: Lustre: Build Version: 2.10.4
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
header at ffff9eba6a2bbb00[0x0, 1, [0x1:0x0:0x0] hash exist]{
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....local_storage at ffff9eba6a2bbb50
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....osd-zfs at ffff9eba58ed03a8osd-zfs-object@ffff9eba58ed03a8
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini()) }
header at ffff9eba6a2bbb00
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
header at ffff9eba7bfd1d40[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....local_storage at ffff9eba7bfd1d90
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....osd-zfs at ffff9eba61f0b5a0osd-zfs-object@ffff9eba61f0b5a0
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini()) }
header at ffff9eba7bfd1d40
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
header at ffff9eba6a2bb8c0[0x0, 1, [0x200000003:0x2:0x0] hash exist]{
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....local_storage at ffff9eba6a2bb910
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....osd-zfs at ffff9eba58ed0000osd-zfs-object@ffff9eba58ed0000
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini()) }
header at ffff9eba6a2bb8c0
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
header at ffff9eba6a2ba300[0x0, 1, [0xa:0x0:0x0] hash exist]{
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....local_storage at ffff9eba6a2ba350
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....osd-zfs at ffff9eba58ed09c0osd-zfs-object@ffff9eba58ed09c0
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini()) }
header at ffff9eba6a2ba300
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
header at ffff9eba6a2bba40[0x0, 1, [0xa:0x10:0x0] hash exist]{
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....local_storage at ffff9eba6a2bba90
JLustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini())
....osd-zfs at ffff9eba56508618osd-zfs-object@ffff9eba56508618
LustreError: 13556:0:(ofd_dev.c:251:ofd_stack_fini()) }
header at ffff9eba6a2bba40
LustreError: 13556:0:(obd_config.c:558:class_setup()) setup
newwork-OST0000 failed (-17)
LustreError: 13556:0:(obd_config.c:1682:class_config_llog_handler())
MGC172.16.100.254 at o2ib: cfg command failed: rc = -17
Lustre:    cmd=cf003 0:newwork-OST0000  1:dev  2:0  3:f
LustreError: 15c-8: MGC172.16.100.254 at o2ib: The configuration from log
'newwork-OST0000' failed (-17). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 13213:0:(obd_mount_server.c:1386:server_start_targets())
failed to start server newwork-OST0000: -17
LustreError: 13213:0:(obd_mount_server.c:1879:server_fill_super())
Unable to start targets: -17
LustreError: 13213:0:(obd_config.c:609:class_cleanup()) Device 3 not setup
Lustre: server umount newwork-OST0000 complete
LustreError: 13213:0:(obd_mount.c:1582:lustre_fill_super()) Unable to
mount store/ost (-17)
mount.lustre: mount store/ost at /lustre/ost1 failed: File exists
lustre-ost1.mount mount process exited, code=exited status=17
Failed to mount /lustre/ost1.
Unit lustre-ost1.mount entered failed state.

I've attached the output of 'lctl debug_kernel' as well.


I've tried sending the contents of the mdt to a new dataset (created
with the --replace option which I pulled from
https://jira.whamcloud.com/browse/LUDOC-161) but it results in the same
error. I've also tried invoking tunefs.lustre --writeconf a few times to
no avail.
Any suggestions?

This is a working-data filesystem so it's not life or death that it be
recovered, but I am trying to make a best effort to recover what is
there. Is there any hope for recovery?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_debug.txt.gz
Type: application/gzip
Size: 41536 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180627/f0db340c/attachment-0001.bin>


More information about the lustre-discuss mailing list