[Lustre-discuss] files missing after writeconf

Fri Jul 9 09:51:34 PDT 2010

Unmount the MDS and mount it as type ldiskfs and list the ROOT directory. If there are no files there then it seems that somehow you have deleted or reformatted the MDS Filesystem.

You could also check lost+found at that point in case your files were moved by e2fsck for some reason. 

Check 'dumpe2fs -h' on the mds device to see what the format time is.

If there are no more files on the MDS then the best you can do is to run lfsck and link all the orphan objects into the Lustre lost+found dir and look at the file contents to identify them. 

If you have a backup it would be easier to just restore from that. Sorry. 

Cheers, Andreas

On 2010-07-08, at 19:34, David Gucker <dgucker at choopa.com> wrote:

> When bringing up the cluster after a full powerdown, the MDS/MGS node 
> was reporting the following for for each of the OSTs:
> 
> Jul  8 17:16:18 ID6317 kernel: LustreError: 13b-9: Test01-OST0000 claims 
> to have registered, but this MGS does not know about it, preventing 
> registration.
> Jul  8 17:16:18 ID6317 kernel: LustreError: 
> 26184:0:(mgs_handler.c:660:mgs_handle()) MGS handle cmd=253 rc=-2
> 
> I have two OSS's and checked back to my mkfs commands and it looks like 
> I forgot to enable failover in the options.  So I found that I could 
> update that flag using tunefs.lustre.  Looking into that a bit I found 
> that I should run it with --writeconf flag as well.
> 
> So, I unmounted the OST's and ran:
> tunefs.lustre --param failover.mode=failout /dev/iscsi/ost-1.target0
> 
> on each of them.   After doing this (and maybe remounting the mds/mgs), 
> I was able to mount the OSTs, and then mounted the client but all data 
> was missing. The filesystem reports 11% full which is about right for 
> the data that was on there but no files.
> 
> After reading the docs a bit better I found that I should have done 
> things more properly (fully shutdown and unloaded the filesystem, then 
> done the writeconf beginning with the mgs).  So I tried running through 
> the proceedure a little better and filesystem is in the same state 
> (appears to be fine, just shows used space and no files).
> 
> I was unable to recreate this in another test cluster (no data loss).   
> So, I'm wondering if these files are recoverable at all?  Can anyone 
> point me in the right direction, if there is one?
> 
> Dave
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss