[Lustre-discuss] Help reviving a 1.4.x volume with a destroyed OST

Klaus Steden klaus.steden at thomson.net
Fri May 16 19:05:02 PDT 2008


Hello there,

We had a bit of an accident in one of our labs earlier today, and it
effectively destroyed one of the OSTs in the Lustre file system. From what I
can figure (I wasn't there at the time), one of the OSSes re-provisioned
itself accidentally, and installed its OS information on one of the OSTs in
the cluster. So now we've got a file system with 16 OSTs, one of which is
actually a regular Linux OS install.

We're not quite so worried about the data that's been lost, but it would be
good to bring the file system back online with the hole in place to inspect
it for damage, and then subsequently reformat the damaged piece and
re-insert it into the existing file system.

I've tried doing an 'lctl --inactive <UUID> config.xml' on the OSS in
question, but it always errors out. I can't pull the UUID off the disk
itself presumably because it was destroyed when the disk was rewritten. From
the config.xml, the UUIDs all look pretty generic -- 'ost2_UUID',
'ost7_UUID', etc. -- but if I use 'blkid' on any of the corresponding LUNs,
I get strings that resemble actual real-world UUIDs.

Is there any place I can extract the
previously-generated-and-now-sadly-destroyed UUID for the damaged OST?

Is the generic-looking UUID field in the XML file an actual UUID?

When it comes time to re-insert the OST in question back into the file
system, is it simply a matter of adding it the same way as adding a new OST,
or will I have to remove information about the previous OST if I want to
replace it inline?

I looked through the manual and Google fairly extensively, but I couldn't
quite find the information I was looking for.

Any help would be greatly appreciated!

thanks,
Klaus




More information about the lustre-discuss mailing list