[lustre-discuss] How to remove an OST completely

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Thu Mar 4 20:56:54 PST 2021


Dear Angelos,

On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-discuss wrote:
> Hi TH,
> 
> I think you'll have to set max_create_count=20000 after step 7 unless you
> unmount and remount your MDT.

Yes. You are right. We have to set max_create_count=20000 for the replaced
OST, otherwise it will not accept newly created files.

> And for step 4, I used conf_param instead of set_param during my drill and I
> noticed this might be more resilient if you are using a HA pair for the MDT
> because the MDS might try to activate the inactive OST during failover as
> set_param is only changing run time option?
> 
> Regards,
> Angelos

I am concerning that, sometimes, the replacement of the OST many take a
long time. In between we may encounter some other events that need to
reboot the MDT servers. I am only sure that we can deactivate / reactivate
the OST by conf_param when MDT server is not rebooted. Once MDT server
is rebooted after setting conf_param=0 on the OST, I am not sure whether
it can be recovered back or not.

So probably I missed another step. Between step 6 and 7, we need to
reactivate the old OST before mounting the new OST ?

6. Prepare the new OST for replacement by mkfs.lustre with --replace
   option, and set the index to the old OST index (e.g., 0x8):
   ....

6.5. Reactivate the old OST index:

   lctl set_param osc.chome-OST0008-osc-MDT0000.active=1

7. Mount the new OST (run in the new OST server).

8. Release the new OST for accepting new objects:

   lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=20000


Cheers,

T.H.Hsieh


> On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > Dear Hans,
> > 
> > Thank you very much. Replacing the OST is new to me and very very
> > useful. We will try it next time.
> > 
> > So, according to the description of the manual, to replace the OST
> > we probably need to:
> > 
> > 1. Lock the old OST (e.g., chome-OST0008) such that it will not
> >     create new files (run in the MDT server):
> > 
> >     lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=0
> > 
> > 2. Locate the list of files in the old OST: (e.g., chome-OST0008):
> >     (run in the client):
> > 
> >     lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt
> > 
> > 3. Migrate the listed files in /tmp/OST0008.txt out of the old OST.
> >     (run in the client).
> > 
> > 4. Remove the old OST temporarily (run in the MDT server):
> > 
> >     lctl set_param osc.chome-OST0008-osc-MDT0000.active=0
> > 
> >     (Note: should use "set_param" instead of "conf_param")
> > 
> > 5. Unmount the old OST partition (run in the old OST server)
> > 
> > 6. Prepare the new OST for replacement by mkfs.lustre with --replace
> >     option, and set the index to the old OST index (e.g., 0x8):
> >     (run in the new OST server)
> > 
> >     mkfs.lustre --ost --mgsnode=XXXXXX --index=0x8 --replace <device_name>
> > 
> > 7. Mount the new OST (run in the new OST server).
> > 
> > 
> > Best Regards,
> > 
> > T.H.Hsieh
> > 
> > 
> > On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via lustre-discuss wrote:
> > > Hi,
> > > 
> > > The manual describe this:
> > > 
> > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > 
> > > There is a note telling you that it will still be there, but can be
> > > replaced.
> > > 
> > > Hope you migrated your data away from the OST also. Otherwise you would
> > > have lost it.
> > > 
> > > Cheers,
> > > Hans Henrik
> > > 
> > > On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> > > > Dear All,
> > > > 
> > > > Here is a question about how to remove an OST completely without
> > > > restarting the Lustre file system. Our Lustre version is 2.12.6.
> > > > 
> > > > We did the following steps to remove the OST:
> > > > 
> > > > 1. Lock the OST (e.g., chome-OST0008) such that it will not create
> > > >     new files (run in the MDT server):
> > > > 
> > > >     lctl set_param osc.chome-OST0008-osc-MDT0000.max_create_count=0
> > > > 
> > > > 2. Locate the list of files in the target OST: (e.g., chome-OST0008):
> > > >     (run in the client):
> > > > 
> > > >     lfs find --obd chome-OST0008_UUID /home
> > > > 
> > > > 3. Remove OST (run in the MDT server):
> > > >     lctl conf_param osc.chome-OST0008-osc-MDT0000.active=0
> > > > 
> > > > 4. Unmount the OST partition (run in the OST server)
> > > > 
> > > > After that, the total size of the Lustre file system decreased, and
> > > > everything looks fine. However, without restarting (i.e., rebooting
> > > > Lustre MDT / OST servers), we still feel that the removed OST is
> > > > still exists. For example, in MDT:
> > > > 
> > > > # lctl get_param osc.*.active
> > > > osc.chome-OST0000-osc-MDT0000.active=1
> > > > osc.chome-OST0001-osc-MDT0000.active=1
> > > > osc.chome-OST0002-osc-MDT0000.active=1
> > > > osc.chome-OST0003-osc-MDT0000.active=1
> > > > osc.chome-OST0008-osc-MDT0000.active=0
> > > > osc.chome-OST0010-osc-MDT0000.active=1
> > > > osc.chome-OST0011-osc-MDT0000.active=1
> > > > osc.chome-OST0012-osc-MDT0000.active=1
> > > > osc.chome-OST0013-osc-MDT0000.active=1
> > > > osc.chome-OST0014-osc-MDT0000.active=1
> > > > 
> > > > We still see chome-OST0008. And in dmesg of MDT, we see a lot of:
> > > > 
> > > > LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) chome-OST0008-osc-MDT0000:osp_attr_get update error [0x100080000:0x10a54c:0x0]: rc = -108
> > > > 
> > > > In addition, when running LFSCK in the MDT server:
> > > > 
> > > > 	lctl lfsck_start -A
> > > > 
> > > > even after all the works of MDT and OST are completed, we still see that
> > > > (run in MDT server):
> > > > 
> > > > 	lctl get_param mdd.*.lfsck_layout
> > > > 
> > > > the status is not completed:
> > > > 
> > > > mdd.chome-MDT0000.lfsck_layout=
> > > > name: lfsck_layout
> > > > magic: 0xb1732fed
> > > > version: 2
> > > > status: partial
> > > > flags: incomplete
> > > > param: all_targets
> > > > last_completed_time: 1614762495
> > > > time_since_last_completed: 4325 seconds
> > > > ....
> > > > 
> > > > We suspect that the "incomplete" part might due to the already removed
> > > > chome-OST0008.
> > > > 
> > > > Is there any way to completely remove the chome-OST0008 from the Lustre
> > > > file system ? since that OST device has already been reformatted for
> > > > other usage.
> > > > 
> > > > Thanks very much.
> > > > 
> > > > 
> > > > T.H.Hsieh
> > > > _______________________________________________
> > > > lustre-discuss mailing list
> > > > lustre-discuss at lists.lustre.org
> > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > _______________________________________________
> > > lustre-discuss mailing list
> > > lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list