[lustre-discuss] How to remove an OST completely

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Mon Mar 8 00:58:04 PST 2021


Dear Zeehan,

In our case, it looks like that the removed OST is disappeared. However,
sometimes we feel that some "shodow" of the removed OST still exists in
the Lustre file system.

In our system, running "lctl get_param osc.*.conn_uuid" it shows:

osc.chome-OST0000-osc-MDT0000.ost_conn_uuid=192.168.32.242 at o2ib
osc.chome-OST0001-osc-MDT0000.ost_conn_uuid=192.168.32.241 at o2ib
osc.chome-OST0002-osc-MDT0000.ost_conn_uuid=192.168.32.241 at o2ib
osc.chome-OST0003-osc-MDT0000.ost_conn_uuid=192.168.32.241 at o2ib
osc.chome-OST0008-osc-MDT0000.ost_conn_uuid=192.168.32.243 at o2ib
osc.chome-OST0010-osc-MDT0000.ost_conn_uuid=192.168.32.241 at o2ib
osc.chome-OST0011-osc-MDT0000.ost_conn_uuid=192.168.32.243 at o2ib
osc.chome-OST0012-osc-MDT0000.ost_conn_uuid=192.168.32.243 at o2ib
osc.chome-OST0013-osc-MDT0000.ost_conn_uuid=192.168.32.243 at o2ib
osc.chome-OST0014-osc-MDT0000.ost_conn_uuid=192.168.32.243 at o2ib

Note that OST0008 is the one we have removed before. In fact, the server
used to have the following OSTs:

OST0008, OST0009, OST000a, OST000b, OST000c, OST000d

which were all ldiskfs backend partitions. We want to convert them to ZFS
backend partitions. So, one by one, we locked them to prevent creating
new files, follow the lfs_migrate procedure to move all the data out,
and deactivate them by:

	lctl conf_param chome-OST0008-osc-MDT0000.osc.active=0

and finally unmount them from the OST server. Whenever one OST was
successfully unmounted, we verified that there is no data lost. The
whole process took several months because we kept the whole system in
production run without stopping.

After the final one OST0008 was removed, we rebooted the OST server,
reinstalled Lustre with the zfs backend, repartitioned the storage,
reformatted the partitions, and remounted them back, which are OST0011,
OST0012, OST0013, and OST0014. Then, when we were happy to celebrate
that we have finally done the complicated and long task, we suddently
found that the "shadow" of OST0008 is still there. But the other old
OSTs seems really disappeared.

It is still unclear why only OST0008 behaves differently. We guess we
may need to shutdown the Lustre file system completely, reboot the MDT
server, and probably run "tunefs.lustre --writeconf" for all the MDT
and OSTs in order to clear out OST0008 completely. But we need to find a
chance to do that because a lot of users are quite busy in our system.

Best Regards,

T.H.Hsieh


On Mon, Mar 08, 2021 at 11:18:53AM +0300, Zeeshan Ali Shah wrote:
> Dear Tung-Han, even during all the above steps the OST still appears? we
> did the same for 3 OSTs in our centre and they disappeared correctly.
> 
> 
> Zeehan
> 
> On Mon, Mar 8, 2021 at 11:15 AM Tung-Han Hsieh <
> thhsieh at twcp1.phys.ntu.edu.tw> wrote:
> 
> > Dear Zeeshan,
> >
> > Yes. We did lfs_migrate to move data out of the OST which we are
> > going to remove, and then deactivate it, and then unmount it. We
> > have verified that after the whole process, no data lost, but only
> > the total space of the Lustre file system decreased due to the
> > removed OST.
> >
> > Best Regards,
> >
> > T.H.Hsieh
> >
> >
> > On Mon, Mar 08, 2021 at 11:08:26AM +0300, Zeeshan Ali Shah wrote:
> > > Did you unmount the OST ? remember to lfs_migrate the data otherwise old
> > > data would give errors
> > >
> > > On Fri, Mar 5, 2021 at 11:59 AM Etienne Aujames via lustre-discuss <
> > > lustre-discuss at lists.lustre.org> wrote:
> > >
> > > > Hello,
> > > >
> > > > There is some process/work in progress on the LU-7668 to remove the OST
> > > > directly on the MGS configuration.
> > > >
> > > > In the comment section Andreas describes a way to remove an OST with
> > > > llog_print and llog_cancel (see https://review.whamcloud.com/41449).
> > > >
> > > > Stephane Thiell have submitted a patch (
> > > > https://review.whamcloud.com/41449/) to implement this process
> > directly
> > > > inside a lctl command "del_ost".
> > > >
> > > > This process could be applied live, the changes will take effect only
> > > > after whole system remount (when MGS configuration is read by
> > > > clients/MDT).
> > > >
> > > > This process does not replace the migrate/locking parts.
> > > >
> > > > We tested this process in production, but maybe for now this is bit
> > > > risky. So I recommend to backup the MGS configuration.
> > > >
> > > > Best regards.
> > > >
> > > > Etienne AUJAMES
> > > >
> > > > On Fri, 2021-03-05 at 12:56 +0800, Tung-Han Hsieh via lustre-discuss
> > > > wrote:
> > > > > Dear Angelos,
> > > > >
> > > > > On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-
> > > > > discuss wrote:
> > > > > > Hi TH,
> > > > > >
> > > > > > I think you'll have to set max_create_count=20000 after step 7
> > > > > > unless you
> > > > > > unmount and remount your MDT.
> > > > >
> > > > > Yes. You are right. We have to set max_create_count=20000 for the
> > > > > replaced
> > > > > OST, otherwise it will not accept newly created files.
> > > > >
> > > > > > And for step 4, I used conf_param instead of set_param during my
> > > > > > drill and I
> > > > > > noticed this might be more resilient if you are using a HA pair for
> > > > > > the MDT
> > > > > > because the MDS might try to activate the inactive OST during
> > > > > > failover as
> > > > > > set_param is only changing run time option?
> > > > > >
> > > > > > Regards,
> > > > > > Angelos
> > > > >
> > > > > I am concerning that, sometimes, the replacement of the OST many take
> > > > > a
> > > > > long time. In between we may encounter some other events that need to
> > > > > reboot the MDT servers. I am only sure that we can deactivate /
> > > > > reactivate
> > > > > the OST by conf_param when MDT server is not rebooted. Once MDT
> > > > > server
> > > > > is rebooted after setting conf_param=0 on the OST, I am not sure
> > > > > whether
> > > > > it can be recovered back or not.
> > > > >
> > > > > So probably I missed another step. Between step 6 and 7, we need to
> > > > > reactivate the old OST before mounting the new OST ?
> > > > >
> > > > > 6. Prepare the new OST for replacement by mkfs.lustre with --replace
> > > > >    option, and set the index to the old OST index (e.g., 0x8):
> > > > >    ....
> > > > >
> > > > > 6.5. Reactivate the old OST index:
> > > > >
> > > > >    lctl set_param osc.chome-OST0008-osc-MDT0000.active=1
> > > > >
> > > > > 7. Mount the new OST (run in the new OST server).
> > > > >
> > > > > 8. Release the new OST for accepting new objects:
> > > > >
> > > > >    lctl set_param osc.chome-OST0008-osc-
> > > > > MDT0000.max_create_count=20000
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > T.H.Hsieh
> > > > >
> > > > >
> > > > > > On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > > > > > > Dear Hans,
> > > > > > >
> > > > > > > Thank you very much. Replacing the OST is new to me and very very
> > > > > > > useful. We will try it next time.
> > > > > > >
> > > > > > > So, according to the description of the manual, to replace the
> > > > > > > OST
> > > > > > > we probably need to:
> > > > > > >
> > > > > > > 1. Lock the old OST (e.g., chome-OST0008) such that it will not
> > > > > > >     create new files (run in the MDT server):
> > > > > > >
> > > > > > >     lctl set_param osc.chome-OST0008-osc-
> > > > > > > MDT0000.max_create_count=0
> > > > > > >
> > > > > > > 2. Locate the list of files in the old OST: (e.g., chome-
> > > > > > > OST0008):
> > > > > > >     (run in the client):
> > > > > > >
> > > > > > >     lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt
> > > > > > >
> > > > > > > 3. Migrate the listed files in /tmp/OST0008.txt out of the old
> > > > > > > OST.
> > > > > > >     (run in the client).
> > > > > > >
> > > > > > > 4. Remove the old OST temporarily (run in the MDT server):
> > > > > > >
> > > > > > >     lctl set_param osc.chome-OST0008-osc-MDT0000.active=0
> > > > > > >
> > > > > > >     (Note: should use "set_param" instead of "conf_param")
> > > > > > >
> > > > > > > 5. Unmount the old OST partition (run in the old OST server)
> > > > > > >
> > > > > > > 6. Prepare the new OST for replacement by mkfs.lustre with --
> > > > > > > replace
> > > > > > >     option, and set the index to the old OST index (e.g., 0x8):
> > > > > > >     (run in the new OST server)
> > > > > > >
> > > > > > >     mkfs.lustre --ost --mgsnode=XXXXXX --index=0x8 --replace
> > > > > > > <device_name>
> > > > > > >
> > > > > > > 7. Mount the new OST (run in the new OST server).
> > > > > > >
> > > > > > >
> > > > > > > Best Regards,
> > > > > > >
> > > > > > > T.H.Hsieh
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via
> > > > > > > lustre-discuss wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > The manual describe this:
> > > > > > > >
> > > > > > > >
> > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > > > > > >
> > > > > > > > There is a note telling you that it will still be there, but
> > > > > > > > can be
> > > > > > > > replaced.
> > > > > > > >
> > > > > > > > Hope you migrated your data away from the OST also. Otherwise
> > > > > > > > you would
> > > > > > > > have lost it.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Hans Henrik
> > > > > > > >
> > > > > > > > On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> > > > > > > > > Dear All,
> > > > > > > > >
> > > > > > > > > Here is a question about how to remove an OST completely
> > > > > > > > > without
> > > > > > > > > restarting the Lustre file system. Our Lustre version is
> > > > > > > > > 2.12.6.
> > > > > > > > >
> > > > > > > > > We did the following steps to remove the OST:
> > > > > > > > >
> > > > > > > > > 1. Lock the OST (e.g., chome-OST0008) such that it will not
> > > > > > > > > create
> > > > > > > > >     new files (run in the MDT server):
> > > > > > > > >
> > > > > > > > >     lctl set_param osc.chome-OST0008-osc-
> > > > > > > > > MDT0000.max_create_count=0
> > > > > > > > >
> > > > > > > > > 2. Locate the list of files in the target OST: (e.g., chome-
> > > > > > > > > OST0008):
> > > > > > > > >     (run in the client):
> > > > > > > > >
> > > > > > > > >     lfs find --obd chome-OST0008_UUID /home
> > > > > > > > >
> > > > > > > > > 3. Remove OST (run in the MDT server):
> > > > > > > > >     lctl conf_param osc.chome-OST0008-osc-MDT0000.active=0
> > > > > > > > >
> > > > > > > > > 4. Unmount the OST partition (run in the OST server)
> > > > > > > > >
> > > > > > > > > After that, the total size of the Lustre file system
> > > > > > > > > decreased, and
> > > > > > > > > everything looks fine. However, without restarting (i.e.,
> > > > > > > > > rebooting
> > > > > > > > > Lustre MDT / OST servers), we still feel that the removed OST
> > > > > > > > > is
> > > > > > > > > still exists. For example, in MDT:
> > > > > > > > >
> > > > > > > > > # lctl get_param osc.*.active
> > > > > > > > > osc.chome-OST0000-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0001-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0002-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0003-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0008-osc-MDT0000.active=0
> > > > > > > > > osc.chome-OST0010-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0011-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0012-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0013-osc-MDT0000.active=1
> > > > > > > > > osc.chome-OST0014-osc-MDT0000.active=1
> > > > > > > > >
> > > > > > > > > We still see chome-OST0008. And in dmesg of MDT, we see a lot
> > > > > > > > > of:
> > > > > > > > >
> > > > > > > > > LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) chome-
> > > > > > > > > OST0008-osc-MDT0000:osp_attr_get update error
> > > > > > > > > [0x100080000:0x10a54c:0x0]: rc = -108
> > > > > > > > >
> > > > > > > > > In addition, when running LFSCK in the MDT server:
> > > > > > > > >
> > > > > > > > >       lctl lfsck_start -A
> > > > > > > > >
> > > > > > > > > even after all the works of MDT and OST are completed, we
> > > > > > > > > still see that
> > > > > > > > > (run in MDT server):
> > > > > > > > >
> > > > > > > > >       lctl get_param mdd.*.lfsck_layout
> > > > > > > > >
> > > > > > > > > the status is not completed:
> > > > > > > > >
> > > > > > > > > mdd.chome-MDT0000.lfsck_layout=
> > > > > > > > > name: lfsck_layout
> > > > > > > > > magic: 0xb1732fed
> > > > > > > > > version: 2
> > > > > > > > > status: partial
> > > > > > > > > flags: incomplete
> > > > > > > > > param: all_targets
> > > > > > > > > last_completed_time: 1614762495
> > > > > > > > > time_since_last_completed: 4325 seconds
> > > > > > > > > ....
> > > > > > > > >
> > > > > > > > > We suspect that the "incomplete" part might due to the
> > > > > > > > > already removed
> > > > > > > > > chome-OST0008.
> > > > > > > > >
> > > > > > > > > Is there any way to completely remove the chome-OST0008 from
> > > > > > > > > the Lustre
> > > > > > > > > file system ? since that OST device has already been
> > > > > > > > > reformatted for
> > > > > > > > > other usage.
> > > > > > > > >
> > > > > > > > > Thanks very much.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > T.H.Hsieh
> > > > > > > > > _______________________________________________
> > > > > > > > > lustre-discuss mailing list
> > > > > > > > > lustre-discuss at lists.lustre.org
> > > > > > > > >
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > > > > _______________________________________________
> > > > > > > > lustre-discuss mailing list
> > > > > > > > lustre-discuss at lists.lustre.org
> > > > > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > > > _______________________________________________
> > > > > > > lustre-discuss mailing list
> > > > > > > lustre-discuss at lists.lustre.org
> > > > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > > _______________________________________________
> > > > > > lustre-discuss mailing list
> > > > > > lustre-discuss at lists.lustre.org
> > > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > > _______________________________________________
> > > > > lustre-discuss mailing list
> > > > > lustre-discuss at lists.lustre.org
> > > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > _______________________________________________
> > > > lustre-discuss mailing list
> > > > lustre-discuss at lists.lustre.org
> > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > >
> >


More information about the lustre-discuss mailing list