[lustre-discuss] Lustre 2.8.0 - MDT/MGT failing to mount

Steve Barnet barnet at icecube.wisc.edu
Thu May 4 07:03:43 PDT 2017


Hi all,

   This is Lustre 2.8.0 community edition, combined MGS/MDT.

I was adding storage to a filesystem and mistakenly duplicated an
index for one of the OSTs at creation time. Since these OSTs were
new and no data had been written, I made the mistake of reformatting
the affected OSTs (including the first one I successfully mounted).

   When I tried to remount the newly formatted OST, the MDS kernel
panicked (log attached). After a device level backup and an e2fsck,
I can mount the MDT as ldiskfs. e2fsck did correct some orphaned
inodes, but those appear to be user files only, nothing from the
Lustre metadata files themselves.

   However, the MDT/MGT still will not mount. The logs indicate
that the original definition of the duplicated OST still exists
somewhere. I checked the CONFIGS directory, and indeed there was
a file associated with the OST in question. I copied that file
out of the CONFIGS directory and attempted to mount the MDT/MGT
again, but no change.

The logs read:

May  4 06:41:22 lfs4-mds kernel: Lustre: MGS: Connection restored to 
MGC10.128.11.174 at tcp1_0 (at 0 at lo)
May  4 06:41:22 lfs4-mds kernel: LustreError: 
12300:0:(genops.c:334:class_newdev()) Device lfs4-OST000e-osc-MDT0000 
already exists at 22, won't add
May  4 06:41:22 lfs4-mds kernel: LustreError: 
12300:0:(obd_config.c:370:class_attach()) Cannot create device 
lfs4-OST000e-osc-MDT0000 of type osp : -17
May  4 06:41:22 lfs4-mds kernel: LustreError: 
12300:0:(obd_config.c:1666:class_config_llog_handler()) 
MGC10.128.11.174 at tcp1: cfg command failed: rc = -17
May  4 06:41:22 lfs4-mds kernel: Lustre:    cmd=cf001 
0:lfs4-OST000e-osc-MDT0000  1:osp  2:lfs4-MDT0000-mdtlov_UUID
May  4 06:41:22 lfs4-mds kernel:
May  4 06:41:22 lfs4-mds kernel: LustreError: 15c-8: 
MGC10.128.11.174 at tcp1: The configuration from log 'lfs4-MDT0000' failed 
(-17). This may be the result of communication errors between this node 
and the MGS, a bad configuration, or other errors. See the syslog for 
more information.
May  4 06:41:22 lfs4-mds kernel: LustreError: 
12213:0:(obd_mount_server.c:1309:server_start_targets()) failed to start 
server lfs4-MDT0000: -17
May  4 06:41:22 lfs4-mds kernel: LustreError: 
12213:0:(obd_mount_server.c:1798:server_fill_super()) Unable to start 
targets: -17
May  4 06:41:22 lfs4-mds kernel: Lustre: Failing over lfs4-MDT0000
May  4 06:41:28 lfs4-mds kernel: Lustre: 
12213:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has 
timed out for slow reply: [sent 1493898082/real 1493898082] 
req at ffff8803113459c0 x1566404887184424/t0(0) 
o251->MGC10.128.11.174 at tcp1@0 at lo:26/25 lens 224/224 e 0 to 1 dl 
1493898088 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
May  4 06:41:28 lfs4-mds kernel: Lustre: server umount lfs4-MDT0000 complete
May  4 06:41:28 lfs4-mds kernel: LustreError: 
12213:0:(obd_mount.c:1426:lustre_fill_super()) Unable to mount  (-17)
May  4 06:45:04 lfs4-mds kernel: LDISKFS-fs (sdb): mounted filesystem 
with ordered data mode. quota=on. Opts:


Again, no data was written to these. I was poking around a bit with
the procedure for fixing a bad LAST_ID. From what I was able to
piece together, it doesn't look like the MDT has any notion of
precreated objects on this OST yet, so I am suspecting something
in mountdata, perhaps.

Any ideas?

Thanks much!

Best,

---Steve

-------------- next part --------------
May  3 14:20:31 lfs4-mds kernel: Lustre: MGS: Connection restored to a5bce68d-fd2c-8bd8-c20b-e5713dc99a03 (at 10.128.11.157 at tcp1)
May  3 14:20:31 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May  3 14:20:58 lfs4-mds kernel: Lustre: lfs4-MDT0000: Connection restored to 10.128.11.157 at tcp1 (at 10.128.11.157 at tcp1)
May  3 14:20:58 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May  3 14:21:24 lfs4-mds kernel: Lustre: lfs4-MDT0000: Connection restored to a5bce68d-fd2c-8bd8-c20b-e5713dc99a03 (at 10.128.11.157 at tcp
1)
May  3 14:25:33 lfs4-mds kernel: Lustre: MGS: Connection restored to fcadb8d1-5c7f-143e-145e-c580fa091b56 (at 10.128.11.156 at tcp1)
May  3 14:25:43 lfs4-mds kernel: Lustre: 2223:0:(mgc_request.c:1680:mgc_process_recover_log()) Process recover log lfs4-mdtir error -22
May  3 14:25:43 lfs4-mds kernel: LustreError: 5156:0:(ldlm_lib.c:462:client_obd_setup()) can't add initial connection
May  3 14:25:43 lfs4-mds kernel: LustreError: 5156:0:(osp_dev.c:1145:osp_init0()) lfs4-OST000e-osc-MDT0000: can't setup obd: rc = -2
May  3 14:25:43 lfs4-mds kernel: LustreError: 5156:0:(obd_config.c:578:class_setup()) setup lfs4-OST000e-osc-MDT0000 failed (-2)
May  3 14:25:43 lfs4-mds kernel: LustreError: 5156:0:(obd_config.c:1666:class_config_llog_handler()) MGC10.128.11.174 at tcp1: cfg command 
failed: rc = -2
May  3 14:25:43 lfs4-mds kernel: Lustre:    cmd=cf003 0:lfs4-OST000e-osc-MDT0000  1:lfs4-OST000e_UUID  2:0@<0:0>  
May  3 14:25:43 lfs4-mds kernel: 
May  3 14:27:26 lfs4-mds kernel: Lustre: MGS: Connection restored to lfs4-MDT0000-lwp-OST000e_UUID (at 10.128.11.156 at tcp1)
May  3 14:27:26 lfs4-mds kernel: Lustre: Skipped 1 previous similar message
May  3 14:27:26 lfs4-mds kernel: LustreError: 140-5: Server lfs4-OST000e requested index 14, but that index is already in use. Use --wri
teconf to force
May  3 14:27:26 lfs4-mds kernel: LustreError: 29874:0:(mgs_handler.c:460:mgs_target_reg()) Failed to write lfs4-OST000e log (-98)
May  3 14:27:36 lfs4-mds kernel: LustreError: 5721:0:(obd_config.c:798:class_add_conn()) try to add conn on immature client dev
May  3 14:27:36 lfs4-mds kernel: LustreError: 5721:0:(lod_lov.c:243:lod_add_device()) ASSERTION( obd->obd_lu_dev->ld_site == lod->lod_dt
_dev.dd_lu_dev.ld_site ) failed: 
May  3 14:27:36 lfs4-mds kernel: LustreError: 5721:0:(lod_lov.c:243:lod_add_device()) LBUG
May  3 14:27:36 lfs4-mds kernel: Pid: 5721, comm: llog_process_th
May  3 14:27:36 lfs4-mds kernel: 
May  3 14:27:36 lfs4-mds kernel: Call Trace:
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa06a1875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa06a1e77>] lbug_with_loc+0x47/0xb0 [libcfs]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa11fa887>] lod_add_device+0x1da7/0x1fe0 [lod]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8129c48e>] ? simple_strtol+0xe/0x20
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8129c793>] ? vsscanf+0x2f3/0x770
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8129c28c>] ? simple_strtoull+0x2c/0x50
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa11f07b9>] lod_process_config+0x1339/0x1540 [lod]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07e5d65>] ? keys_fill+0xd5/0x1b0 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07e643b>] ? lu_context_init+0x8b/0x160 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07d7d05>] class_process_config+0x2225/0x24c0 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff810a185c>] ? remove_wait_queue+0x3c/0x50
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa07d984a>] class_config_llog_handler+0xc1a/0x1d50 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8153a65e>] ? mutex_lock+0x1e/0x50
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa079e3ca>] llog_process_thread+0x94a/0x1040 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa079efc5>] llog_process_thread_daemonize+0x45/0x70 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffffa079ef80>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass]
May  3 14:27:36 lfs4-mds kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
May  3 14:27:36 lfs4-mds kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
May  3 14:27:36 lfs4-mds kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20


More information about the lustre-discuss mailing list