[Lustre-discuss] how to replace a bad OST.
Lundgren, Andrew
Andrew.Lundgren at Level3.com
Wed Mar 19 09:22:13 PDT 2008
What is the best way to do this then when you know an OST cannot be recovered and you don't want your cluster to contain a point that is offline?
--
Andrew
> -----Original Message-----
> From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of
> Andreas Dilger
> Sent: Tuesday, March 18, 2008 5:08 PM
> To: Lundgren, Andrew
> Cc: 'Lustre-discuss at clusterfs.com'; Nathaniel Rutman
> Subject: Re: [Lustre-discuss] how to replace a bad OST.
>
> On Mar 17, 2008 11:29 -0600, Lundgren, Andrew wrote:
> > I am trying to learn how to replace a defective OST with a new one.
> > Assuming the old OST can not be salvaged.
> >
> > I have a test cluster that I am working on.
> >
> > I deactivated the volume on the MGS using:
> >
> > lctl conf_param content-OST0002-osc.osc.active=0
> >
> > I unlinked all of the bad files by finding the ones on the bad volume.
> >
> > I formatted a fresh OST using the index number of the bad device:
> >
> > mkfs.lustre --reformat --fsname content --ost --
> mgsnode=4.248.52.81 at tcp0 --param="failover.mode=failout" --index=02
> /dev/md6
>
> You do not necessarily want to add the new OST in the same slot as the
> old one. There are a few compilcations with doing that, in particular:
> - the MDS will think that new OST has objects up to what the old OST
> had, and when the new OST is first started it will recreate them.
> That will take a long time, and waste a lot of space on the OST, maybe
> all of the inodes in the whole filesystem
> - if you missed removing some of the bad files by accident, they will
> think that the new OST is the same as the old one. Not fatal, but
> you would probably prefer to get an IO error back instead of just
> a zero-length file.
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list