[lustre-discuss] File missing with "Invalid argument" error

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Mon Apr 15 01:32:57 PDT 2019


Dear All,

We are facing a serious problem after a mistake of doing Lustre
(1.8.8) maintenance.

We had a bad OST and want to remove it. So we went to MDS and run

	lctl conf_param foo-OSTXXXX.osc.active=0

After doing this, in MDS there are still logs reside in /proc/fs/lustre/osc/
directory. We want to clean it, too. So we umount the whole lustre
file system (including clients and OSTs) and run

	tunefs.lustre --writeconf /dev/sdXX

for each OST, MDT, and MDS devices. But we made a serious mistake.
We forgot the umount MDT and MGS as well. We saw that the logs were
cleaned before umounting MDT and MGS, and the system hung when we
are going to umount MDT and MGS. After rebooting the MDS, remounting
the whole lustre system. The we saw the logs were regenerated. But
after mounting the clients, we saw a lot of files missing, e.g.,

ls -l /path/to/rsync_tf16_twcp1.bat

/path/to/file: Invalid argument
-????????? ?    ?    ?      ?   ?  ?     ?  /path/to/rsync_tf16_twcp1.bat

Now we have umounted the whole lustre file system, and mount MDT with
ldiskfs. We see that the file ROOT/to/rsync_tf16_twcp1.bat exists.
Running "getfattr" can still extract the following code:

# file: rsync_tf16_twcp1.bat
trusted.lov=0s0AvRCwEAAAB9UG8JAAAAAAAAAAAAAAAAAAAQAAEAAADEjjUAAAAAAAAAAAAAAAAAAAAAAAQAAAA=

My question is: Is it possible to figure out the location of OSTs of
this file from this code ? If it really exists in our current OSTs,
could we re-estiablish the connection and get the file back ?

Sorry this problem is quite serious which affact our research works
terribly, since we have lost a lot of files due to this mistake. Any
suggestion is very appreciated.


Best Regards,

T.H.Hsieh


More information about the lustre-discuss mailing list