[Lustre-discuss] Complicated situation of OST

Thu Jan 10 22:06:31 PST 2013

Dear All,

Sorry that we encoutered a complicated situation of OST in our
Lustre-1.8.7 filesystem. The story is the following. After an
event of unexpected electric power crash, we cannot mount all
the five Lustre OST partitions located in one file server. The
hardware looks ok. But it seems that the last_rcvd and mountdata
files for each partition were broken, since when we run:

	/opt/lustre/sbin/tunefs.lustre --writeconf /dev/sdb1

It returns:

=========================================================================
checking for existing Lustre data: found last_rcvd
tunefs.lustre: Unable to read 1.8 config /tmp/dirMSzCCL/mountdata.
Trying 1.4 config from last_rcvd
Reading last_rcvd
Feature compat=fffa5a5a, incompat=fffa5a5a

   Read previous values:
Target:     
Index:      -370086
UUID:       ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??ZZ??
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x202
              (OST upgrade1.4 )
Persistent mount opts: 
Parameters:

tunefs.lustre FATAL: Must specify --mgsnode=
tunefs.lustre: exiting with 22 (Invalid argument)
=========================================================================

We tried several ways to fix this problem:

1. Copy the correct mountdata file from the other OST which is without
   problem, to the broken OST CONFIGS/mountdata (where both the broken
   OST and health OST were mounted with the ldiskfs). Then use:

	xxd /mnt/broken_OST/CONFIGS/mountdata /tmp/mountdata.edit

   to edit the OST number to the broken one, and convert it back.

2. We have no way to fix the broken last_rcvd file, so we just backup
   it and and delete it from the broken OST.

In this way we can mount the broken OST, we can "ls" files through the
Lustre client. But when running "df" in the Lustre client, the client
halted. Then we unmount the whole Lustre filesystem, and run:

	/opt/lustre/sbin/tunefs.lustre --writeconf /dev/sdb1

again on the broken OST. The same problem still remains. Then we
run "e2fsck" command on the broken OST (it completed its work
without problem), and try this command:

	debugfs -c -R 'ls /O/0/' /dev/sdb1

the error message is:

debugfs 1.40-WIP (14-Nov-2006)
/dev/sdb1: catastrophic mode - not reading inode or group bitmaps
/O/0/: EXT2 directory corrupted 

So we guess, fundmentally, the backend EXT3 or ldiskfs filesystem
still has something wrong. No matter how many times we run the
"e2fsck" command on it, the problem still remains.

Then we decide to backup the data of OST and recreate the OST.
But I think at this point I did something serious wrong. Here are
my steps:

1. Unmount the whole Lustre filesystem.

2. Run the command to disable the broken OST:

	lctl conf_param foo-OST000a.osc.active=0

   (because we know that it can be recovered later with the command:
	lctl conf_param foo-OST000a.osc.active=1

3. Run the command:

	tunefs.lustre --writeconf /dev/XXX

   for all the MDT, OST, except the broken OST.

4. Backup the data of the broken OST in its ldiskfs filesystem.

5. Reformat the broken OST to be a new OST:

	mkfs.lustre --fsname foo --ost --mgsnode=IPADDR /dev/sdb1

6. Mount the new OST as ldiskfs, and restore its data. The broken last_rcvd
   is removed, and the broken mountdata is created in the way I mentioned
   above. We hope that this way the correct last_rcvd can be regenerated,
   and the new OST can be mounted successfully.

However, I got this error message:

==============================================================================
mount.lustre: mount /dev/sdb1 at /cfs/cwarp_ost2 failed: No such device or address
The target service failed to start (bad config log?) (/dev/sdc1).  See /var/log/messages.
==============================================================================

So I think I had did a bad wrong thing on step 2 and 3, which removed all
information of the broken OST out of the MDT.

So, may I ask that whether there is other way to retrive the data from the
broken OST ? Now what we have are:

1. All the data trees in MDT: ROOT/, with probably correct extended attributes.

2. All the data in broken OST: O/0/*.

We don't mine to write scripts or simple code to retrive the data and
backup them, and then recreate the OST and copy data back.

Besides, I will be very appriciated if anyone can comment on my above
stupid procedures. I know that I have many incorrect concepts for each
steps (such as: tunefs.lustre --writeconf /dev/XXX), which leads to my
wrong treatment.

Thanks very mcuh for your help.

Best Regards,

T.H.Hsieh