[lustre-discuss] How to remove an OST completely

Zeeshan Ali Shah javaclinic at gmail.com
Mon Mar 8 00:18:53 PST 2021


Dear Tung-Han, even during all the above steps the OST still appears? we
did the same for 3 OSTs in our centre and they disappeared correctly.


Zeehan

On Mon, Mar 8, 2021 at 11:15 AM Tung-Han Hsieh <
thhsieh at twcp1.phys.ntu.edu.tw> wrote:

> Dear Zeeshan,
>
> Yes. We did lfs_migrate to move data out of the OST which we are
> going to remove, and then deactivate it, and then unmount it. We
> have verified that after the whole process, no data lost, but only
> the total space of the Lustre file system decreased due to the
> removed OST.
>
> Best Regards,
>
> T.H.Hsieh
>
>
> On Mon, Mar 08, 2021 at 11:08:26AM +0300, Zeeshan Ali Shah wrote:
> > Did you unmount the OST ? remember to lfs_migrate the data otherwise old
> > data would give errors
> >
> > On Fri, Mar 5, 2021 at 11:59 AM Etienne Aujames via lustre-discuss <
> > lustre-discuss at lists.lustre.org> wrote:
> >
> > > Hello,
> > >
> > > There is some process/work in progress on the LU-7668 to remove the OST
> > > directly on the MGS configuration.
> > >
> > > In the comment section Andreas describes a way to remove an OST with
> > > llog_print and llog_cancel (see https://review.whamcloud.com/41449).
> > >
> > > Stephane Thiell have submitted a patch (
> > > https://review.whamcloud.com/41449/) to implement this process
> directly
> > > inside a lctl command "del_ost".
> > >
> > > This process could be applied live, the changes will take effect only
> > > after whole system remount (when MGS configuration is read by
> > > clients/MDT).
> > >
> > > This process does not replace the migrate/locking parts.
> > >
> > > We tested this process in production, but maybe for now this is bit
> > > risky. So I recommend to backup the MGS configuration.
> > >
> > > Best regards.
> > >
> > > Etienne AUJAMES
> > >
> > > On Fri, 2021-03-05 at 12:56 +0800, Tung-Han Hsieh via lustre-discuss
> > > wrote:
> > > > Dear Angelos,
> > > >
> > > > On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-
> > > > discuss wrote:
> > > > > Hi TH,
> > > > >
> > > > > I think you'll have to set max_create_count=20000 after step 7
> > > > > unless you
> > > > > unmount and remount your MDT.
> > > >
> > > > Yes. You are right. We have to set max_create_count=20000 for the
> > > > replaced
> > > > OST, otherwise it will not accept newly created files.
> > > >
> > > > > And for step 4, I used conf_param instead of set_param during my
> > > > > drill and I
> > > > > noticed this might be more resilient if you are using a HA pair for
> > > > > the MDT
> > > > > because the MDS might try to activate the inactive OST during
> > > > > failover as
> > > > > set_param is only changing run time option?
> > > > >
> > > > > Regards,
> > > > > Angelos
> > > >
> > > > I am concerning that, sometimes, the replacement of the OST many take
> > > > a
> > > > long time. In between we may encounter some other events that need to
> > > > reboot the MDT servers. I am only sure that we can deactivate /
> > > > reactivate
> > > > the OST by conf_param when MDT server is not rebooted. Once MDT
> > > > server
> > > > is rebooted after setting conf_param=0 on the OST, I am not sure
> > > > whether
> > > > it can be recovered back or not.
> > > >
> > > > So probably I missed another step. Between step 6 and 7, we need to
> > > > reactivate the old OST before mounting the new OST ?
> > > >
> > > > 6. Prepare the new OST for replacement by mkfs.lustre with --replace
> > > >    option, and set the index to the old OST index (e.g., 0x8):
> > > >    ....
> > > >
> > > > 6.5. Reactivate the old OST index:
> > > >
> > > >    lctl set_param osc.chome-OST0008-osc-MDT0000.active=1
> > > >
> > > > 7. Mount the new OST (run in the new OST server).
> > > >
> > > > 8. Release the new OST for accepting new objects:
> > > >
> > > >    lctl set_param osc.chome-OST0008-osc-
> > > > MDT0000.max_create_count=20000
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > T.H.Hsieh
> > > >
> > > >
> > > > > On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > > > > > Dear Hans,
> > > > > >
> > > > > > Thank you very much. Replacing the OST is new to me and very very
> > > > > > useful. We will try it next time.
> > > > > >
> > > > > > So, according to the description of the manual, to replace the
> > > > > > OST
> > > > > > we probably need to:
> > > > > >
> > > > > > 1. Lock the old OST (e.g., chome-OST0008) such that it will not
> > > > > >     create new files (run in the MDT server):
> > > > > >
> > > > > >     lctl set_param osc.chome-OST0008-osc-
> > > > > > MDT0000.max_create_count=0
> > > > > >
> > > > > > 2. Locate the list of files in the old OST: (e.g., chome-
> > > > > > OST0008):
> > > > > >     (run in the client):
> > > > > >
> > > > > >     lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt
> > > > > >
> > > > > > 3. Migrate the listed files in /tmp/OST0008.txt out of the old
> > > > > > OST.
> > > > > >     (run in the client).
> > > > > >
> > > > > > 4. Remove the old OST temporarily (run in the MDT server):
> > > > > >
> > > > > >     lctl set_param osc.chome-OST0008-osc-MDT0000.active=0
> > > > > >
> > > > > >     (Note: should use "set_param" instead of "conf_param")
> > > > > >
> > > > > > 5. Unmount the old OST partition (run in the old OST server)
> > > > > >
> > > > > > 6. Prepare the new OST for replacement by mkfs.lustre with --
> > > > > > replace
> > > > > >     option, and set the index to the old OST index (e.g., 0x8):
> > > > > >     (run in the new OST server)
> > > > > >
> > > > > >     mkfs.lustre --ost --mgsnode=XXXXXX --index=0x8 --replace
> > > > > > <device_name>
> > > > > >
> > > > > > 7. Mount the new OST (run in the new OST server).
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > T.H.Hsieh
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via
> > > > > > lustre-discuss wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > The manual describe this:
> > > > > > >
> > > > > > >
> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > > > > >
> > > > > > > There is a note telling you that it will still be there, but
> > > > > > > can be
> > > > > > > replaced.
> > > > > > >
> > > > > > > Hope you migrated your data away from the OST also. Otherwise
> > > > > > > you would
> > > > > > > have lost it.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Hans Henrik
> > > > > > >
> > > > > > > On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> > > > > > > > Dear All,
> > > > > > > >
> > > > > > > > Here is a question about how to remove an OST completely
> > > > > > > > without
> > > > > > > > restarting the Lustre file system. Our Lustre version is
> > > > > > > > 2.12.6.
> > > > > > > >
> > > > > > > > We did the following steps to remove the OST:
> > > > > > > >
> > > > > > > > 1. Lock the OST (e.g., chome-OST0008) such that it will not
> > > > > > > > create
> > > > > > > >     new files (run in the MDT server):
> > > > > > > >
> > > > > > > >     lctl set_param osc.chome-OST0008-osc-
> > > > > > > > MDT0000.max_create_count=0
> > > > > > > >
> > > > > > > > 2. Locate the list of files in the target OST: (e.g., chome-
> > > > > > > > OST0008):
> > > > > > > >     (run in the client):
> > > > > > > >
> > > > > > > >     lfs find --obd chome-OST0008_UUID /home
> > > > > > > >
> > > > > > > > 3. Remove OST (run in the MDT server):
> > > > > > > >     lctl conf_param osc.chome-OST0008-osc-MDT0000.active=0
> > > > > > > >
> > > > > > > > 4. Unmount the OST partition (run in the OST server)
> > > > > > > >
> > > > > > > > After that, the total size of the Lustre file system
> > > > > > > > decreased, and
> > > > > > > > everything looks fine. However, without restarting (i.e.,
> > > > > > > > rebooting
> > > > > > > > Lustre MDT / OST servers), we still feel that the removed OST
> > > > > > > > is
> > > > > > > > still exists. For example, in MDT:
> > > > > > > >
> > > > > > > > # lctl get_param osc.*.active
> > > > > > > > osc.chome-OST0000-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0001-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0002-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0003-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0008-osc-MDT0000.active=0
> > > > > > > > osc.chome-OST0010-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0011-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0012-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0013-osc-MDT0000.active=1
> > > > > > > > osc.chome-OST0014-osc-MDT0000.active=1
> > > > > > > >
> > > > > > > > We still see chome-OST0008. And in dmesg of MDT, we see a lot
> > > > > > > > of:
> > > > > > > >
> > > > > > > > LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) chome-
> > > > > > > > OST0008-osc-MDT0000:osp_attr_get update error
> > > > > > > > [0x100080000:0x10a54c:0x0]: rc = -108
> > > > > > > >
> > > > > > > > In addition, when running LFSCK in the MDT server:
> > > > > > > >
> > > > > > > >       lctl lfsck_start -A
> > > > > > > >
> > > > > > > > even after all the works of MDT and OST are completed, we
> > > > > > > > still see that
> > > > > > > > (run in MDT server):
> > > > > > > >
> > > > > > > >       lctl get_param mdd.*.lfsck_layout
> > > > > > > >
> > > > > > > > the status is not completed:
> > > > > > > >
> > > > > > > > mdd.chome-MDT0000.lfsck_layout=
> > > > > > > > name: lfsck_layout
> > > > > > > > magic: 0xb1732fed
> > > > > > > > version: 2
> > > > > > > > status: partial
> > > > > > > > flags: incomplete
> > > > > > > > param: all_targets
> > > > > > > > last_completed_time: 1614762495
> > > > > > > > time_since_last_completed: 4325 seconds
> > > > > > > > ....
> > > > > > > >
> > > > > > > > We suspect that the "incomplete" part might due to the
> > > > > > > > already removed
> > > > > > > > chome-OST0008.
> > > > > > > >
> > > > > > > > Is there any way to completely remove the chome-OST0008 from
> > > > > > > > the Lustre
> > > > > > > > file system ? since that OST device has already been
> > > > > > > > reformatted for
> > > > > > > > other usage.
> > > > > > > >
> > > > > > > > Thanks very much.
> > > > > > > >
> > > > > > > >
> > > > > > > > T.H.Hsieh
> > > > > > > > _______________________________________________
> > > > > > > > lustre-discuss mailing list
> > > > > > > > lustre-discuss at lists.lustre.org
> > > > > > > >
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > > > _______________________________________________
> > > > > > > lustre-discuss mailing list
> > > > > > > lustre-discuss at lists.lustre.org
> > > > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > > _______________________________________________
> > > > > > lustre-discuss mailing list
> > > > > > lustre-discuss at lists.lustre.org
> > > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > _______________________________________________
> > > > > lustre-discuss mailing list
> > > > > lustre-discuss at lists.lustre.org
> > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > _______________________________________________
> > > > lustre-discuss mailing list
> > > > lustre-discuss at lists.lustre.org
> > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > _______________________________________________
> > > lustre-discuss mailing list
> > > lustre-discuss at lists.lustre.org
> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210308/68b9d6cd/attachment-0001.html>


More information about the lustre-discuss mailing list