[Lustre-discuss] how to replace a bad OST.

Andreas Dilger adilger at sun.com
Tue Mar 18 16:08:15 PDT 2008


On Mar 17, 2008  11:29 -0600, Lundgren, Andrew wrote:
> I am trying to learn how to replace a defective OST with a new one.
> Assuming the old OST can not be salvaged.
> 
> I have a test cluster that I am working on.
> 
> I deactivated the volume on the MGS using:
> 
> lctl conf_param content-OST0002-osc.osc.active=0
> 
> I unlinked all of the bad files by finding the ones on the bad volume.
> 
> I formatted a fresh OST using the index number of the bad device:
> 
> mkfs.lustre --reformat  --fsname content --ost --mgsnode=4.248.52.81 at tcp0 --param="failover.mode=failout" --index=02 /dev/md6

You do not necessarily want to add the new OST in the same slot as the
old one.  There are a few compilcations with doing that, in particular:
- the MDS will think that new OST has objects up to what the old OST
  had, and when the new OST is first started it will recreate them.
  That will take a long time, and waste a lot of space on the OST, maybe
  all of the inodes in the whole filesystem
- if you missed removing some of the bad files by accident, they will
  think that the new OST is the same as the old one.  Not fatal, but
  you would probably prefer to get an IO error back instead of just
  a zero-length file.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list