[lustre-discuss] OST CONFIG/*-client damaged
thhsieh at twcp1.phys.ntu.edu.tw
Thu Feb 17 00:40:06 PST 2022
After some studies, now we have confirmed that, when mounting the
broken OST with ldiskfs, we saw that the its /CONFIGS/*-client file
So far we don't know how to regenerate it. Now we are asking whether
the following way could be possible or not. Please comment and help us.
1. Mount the broken OST with ldiskfs, and make a fully backup of all
the contents, including every files' extended attributes, i.e.,
getfattr -R -d -m '.*' -P . > /tmp/ea.bak
2. Reformat the broken OST partition to be a new OST, with all the
correct parameter settings (such as MGS IP, failover IP, file system
name, OST index, .... etc). Presumably, this way the correct
/CONFIGS/*-client will be created.
3. Make the file system level restore of the original data from our
backup, but keep the /CONFIGS/*-client un-touched. The extended
attributes will also be restarted by:
Please comment on this method whether it is ok or not. If you have
other ways, please be kind to let us know. Any help will be very
On Tue, Feb 15, 2022 at 06:31:32PM +0800, Tung-Han Hsieh wrote:
> Dear All,
> We encounter a problem to mount a damaged OST partition, as described
> The OST partition suffered serious hard disk damage, which was sent to
> a data rescue company to try to recover the data as much as possible.
> After that, we run
> tunefs.lustre --writeconf /dev/<device_name>
> to clean logs for all MGT, MDT, and OST, and try to mount the Lustre
> file system. But the damaged OST partition cannot be mounted, with
> the following error message:
> mount.lustre: mount /dev/<dev> at /Lustre/ost failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.
> The dmesg message of OST server has the following error:
> LustreError: 157-3: Trying to start OBD lfs2-OST0006-UUID using the wrong disk. Were the /dev/ assignments rearranged?
> LustreError: 36047:0:(obd_config.c:559:class_setup()) setup lfs2-OST0006 failed (-22)
> LustreError: 36047:0:(obd_config.c:1835:class_config_llog_handler()) MGC172.16.31.231 at o2ib: cfg command failed: rc = -22
> Lustre: cmd=cf003 0:lfs2-OST0006 1:dev 2:0 3:f
> LustreError: 15b-f: MGC172.16.31.231 at o2ib: The configuration from log 'lfs2-OST0006' failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre.
> LustreError: 36034:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server lfs2-OST0006: -22
> LustreError: 36034:0:(obs_mount_server.c:1939:server_fill_super()) Unable to start targets: -22
> LustreError: 36034:0:(obd_config.c:610:class_cleanup()) Device 12 not setup
> Lustre: server umount lfs2-OST0006 complete
> LustreError: 36034:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount <dev> (-22)
> The dmesg in MGS/MDT server has the following error:
> Lustre: MGS: Regenerating lfs2-OST0006 log by user request: rc = 0
> Lustre: Found index 6 for lfs2-OST0006, updating log
> Lustre: Client log for lfs2-OST0006 was not updated; writeconf the MDT first to regenerate it.
> We mount the good OST and bad OST with ldiskfs, and compare the files
> found in each partition. We found the following discrepancy:
> -rw-r--r-- 1 root root 60656 Aug 31 16:10 /mnt/bad_OST/CONFIGS/lfs2-client
> -rw-r--r-- 1 root root 75416 Aug 31 16:10 /mnt/good_OST/CONFIGS/lfs2-client
> So we suspect that, after the hard works of the data rescue company,
> the /CONFIGS/lfs2-client file of the bad OST was not successfully
> recovered, which leads to the problem.
> Here is the question: Is it possible to regenerate this file ? On the
> other hand, is there other tips we missed for system recovery ?
> Any suggestion is very appreciated.
> Thank you very much.
> Best Regards,
More information about the lustre-discuss