[lustre-discuss] OST recovery

Andreas Dilger adilger at whamcloud.com
Tue Mar 31 18:10:33 PDT 2020


On Mar 29, 2020, at 20:04, Gong-Do Hwang <grover.hwang at gmail.com<mailto:grover.hwang at gmail.com>> wrote:

Thanks Andreas,

I ran  "mkfs.lustre --ost --reformat --fsname lfs_home --index 6 --mgsnode 10.10.0.10 at o2ib --servicenode 10.10.0.13 at o2ib  --failnode 10.10.0.14 at o2ib  /dev/mapper/mpathx", and at that time /dev/mapper/mpathx was mounted and served as an OST under FS lfs. And the FS lfs ran well until I umount the /dev/mapper/mpathx in order to restart the mgt/mgs.

The issue here is that the "--reformat" option will override the checks if a filesystem already exists on the device.  That should not normally be used.


 And after I re-mounted the ost I got the msg "mount.lustre FATAL: failed to write local files: Invalid argument
mount.lustre: mount /dev/mapper/mpathx at /lfs/ost8 failed: Invalid argument"  and the tunefs.luster --dryrun /dev/mapper/mapthx output is "tunefs.lustre --dryrun /dev/mapper/mpathx
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     lfs-OST0008

This shows that the device was previously part of the "lfs" filesystem at index 8.  While it is possible to change the filesystem name, the OST index should never change, so there is no tool for this.

Two things need to be done.  You can rewrite the filesystem label with "e2label /dev/mapper/mpathx lfs-OST0008".  Then you need to rebuild the "CONFIGS/mountdata" file.

The easiest way to generate a new mountdata file would be to run "mkfs.lustre" with the same options as the original OST on a temporary device (e.g. loopback device) but add in the "--replace" option so that the OST doesn't try to add itself to the filesystem as a new OST.  Then mount the temporary and original OSTs as type ldiskfs and copy the file CONFIGS/mountdata from temp to original OST to replace the broken one (probably a good idea to make a backup first).

Hopefully with these two changes you can mount your OST again.

Cheers, Andreas

Index:      6
Lustre FS:  lfs_home
Mount type: ldiskfs
Flags:      0x1042
              (OST update no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.10.0.10 at o2ib  failover.node=10.10.0.13 at o2ib:10.10.0.14 at o2ib


   Permanent disk data:
Target:     lfs_home-OST0006
Index:      6
Lustre FS:  lfs_home
Mount type: ldiskfs
Flags:      0x1042
              (OST update no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.10.0.10 at o2ib  failover.node=10.10.0.13 at o2ib:10.10.0.14 at o2ib"

I guess I actually have run the mkfs command twice so the Lustre FS in the previous value became lfs_home(originally is lfs).

I tried to mount the partition use backup superblocks and all of them are empty. But from the dumpe2fs info,
"Inode count:              41943040
Block count:              10737418240
Reserved block count:     536870912
Free blocks:              1459812475
Free inodes:              39708575"
seems there is still data on it.

The backup superblocks are for the underlying ext4/ldiskfs filesystem, so are not really related to this problem.


So my problem is, if the data on the partition is still intact, is there any way I can rebuild the file index? And is there anyway I can rewrite the CONFIGS/mountdata back to its original values?
Sorry for the lengthy messages and really appreciate your help!

Best Regards,

Grover

On Mon, Mar 30, 2020 at 7:14 AM Andreas Dilger <adilger at whamcloud.com<mailto:adilger at whamcloud.com>> wrote:
It would be useful if you provided the actual error messages, so we can see where the problem is.

What command did you run on the OST?

Does the OST still show that it has data in it (e.g. "df" or "dumpe2fs -h" shows lots of used blocks)?

On Mar 25, 2020, at 10:05, Gong-Do Hwang <grover.hwang at gmail.com<mailto:grover.hwang at gmail.com>> wrote:

Dear Lustre,

Months ago when I tried to add a new disk to my new Lustre FS, I accidentally target the mkfs.lustre to a then mounted OST partition of another Lustre FS. Weird enough the command passed through, and without paying attention to it, I umount the partition months later and couldn't mount it back, then I realized the mkfs.lustre command was legit.

But my old lustre FS worked well through these months, so I guess the data in that OST is still there. But now the permanent CONFIG/mountdata is the new one, and I can still see my old config in the previous value.

My question is is there any way I can write back the old CONFIG/mountdata and still keep all my files in that OST?

I am using Luster 2.13.0 for my mgs/mdt/ost

Thanks for your help and I really appreciate it!

Grover

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200401/883e02b1/attachment.html>


More information about the lustre-discuss mailing list