[lustre-discuss] Lustre 2.8.0 - MDT/MGT failing to mount
Mohr Jr, Richard Frank (Rick Mohr)
rmohr at utk.edu
Thu May 4 08:01:09 PDT 2017
Did you try doing a writeconf to regenerate the config logs for the file system?
Senior HPC System Administrator
National Institute for Computational Sciences
> On May 4, 2017, at 10:03 AM, Steve Barnet <barnet at icecube.wisc.edu> wrote:
> Hi all,
> This is Lustre 2.8.0 community edition, combined MGS/MDT.
> I was adding storage to a filesystem and mistakenly duplicated an
> index for one of the OSTs at creation time. Since these OSTs were
> new and no data had been written, I made the mistake of reformatting
> the affected OSTs (including the first one I successfully mounted).
> When I tried to remount the newly formatted OST, the MDS kernel
> panicked (log attached). After a device level backup and an e2fsck,
> I can mount the MDT as ldiskfs. e2fsck did correct some orphaned
> inodes, but those appear to be user files only, nothing from the
> Lustre metadata files themselves.
> However, the MDT/MGT still will not mount. The logs indicate
> that the original definition of the duplicated OST still exists
> somewhere. I checked the CONFIGS directory, and indeed there was
> a file associated with the OST in question. I copied that file
> out of the CONFIGS directory and attempted to mount the MDT/MGT
> again, but no change.
> The logs read:
> May 4 06:41:22 lfs4-mds kernel: Lustre: MGS: Connection restored to MGC10.128.11.174 at tcp1_0 (at 0 at lo)
> May 4 06:41:22 lfs4-mds kernel: LustreError: 12300:0:(genops.c:334:class_newdev()) Device lfs4-OST000e-osc-MDT0000 already exists at 22, won't add
> May 4 06:41:22 lfs4-mds kernel: LustreError: 12300:0:(obd_config.c:370:class_attach()) Cannot create device lfs4-OST000e-osc-MDT0000 of type osp : -17
> May 4 06:41:22 lfs4-mds kernel: LustreError: 12300:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.128.11.174 at tcp1: cfg command failed: rc = -17
> May 4 06:41:22 lfs4-mds kernel: Lustre: cmd=cf001 0:lfs4-OST000e-osc-MDT0000 1:osp 2:lfs4-MDT0000-mdtlov_UUID
> May 4 06:41:22 lfs4-mds kernel:
> May 4 06:41:22 lfs4-mds kernel: LustreError: 15c-8: MGC10.128.11.174 at tcp1: The configuration from log 'lfs4-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
> May 4 06:41:22 lfs4-mds kernel: LustreError: 12213:0:(obd_mount_server.c:1309:server_start_targets()) failed to start server lfs4-MDT0000: -17
> May 4 06:41:22 lfs4-mds kernel: LustreError: 12213:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start targets: -17
> May 4 06:41:22 lfs4-mds kernel: Lustre: Failing over lfs4-MDT0000
> May 4 06:41:28 lfs4-mds kernel: Lustre: 12213:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1493898082/real 1493898082] req at ffff8803113459c0 x1566404887184424/t0(0) o251->MGC10.128.11.174 at tcp1@0 at lo:26/25 lens 224/224 e 0 to 1 dl 1493898088 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
> May 4 06:41:28 lfs4-mds kernel: Lustre: server umount lfs4-MDT0000 complete
> May 4 06:41:28 lfs4-mds kernel: LustreError: 12213:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount (-17)
> May 4 06:45:04 lfs4-mds kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=on. Opts:
> Again, no data was written to these. I was poking around a bit with
> the procedure for fixing a bad LAST_ID. From what I was able to
> piece together, it doesn't look like the MDT has any notion of
> precreated objects on this OST yet, so I am suspecting something
> in mountdata, perhaps.
> Any ideas?
> Thanks much!
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
More information about the lustre-discuss