[lustre-discuss] Correct procedure for OST replacment

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Wed Oct 26 17:51:44 PDT 2022


Dear Redl, Robert,

On Wed, Oct 26, 2022 at 02:37:12PM +0000, Redl, Robert wrote:
> Dear Etienne, 
> 
> thanks a lot for the detailed explanation! I will try out the patch at the next opportunity.
> 
> @Tung-Han Hsieh: I think the issue that indices of old OST remain until --writeconf is used is solved by the new command lctl del_ost or the older lctl llog_cancel. Both are removing entries from the configuration log.

Thank you for the comment. I haven't tested "lctl del_ost" or
"lctl llog_cancel" operations. I will look into it.

> But I would also be interested if someone could comment on wether or not it is a good idea to reuse old indices of removed OSTs. We did that meanwhile a few times, but as Thomas Roth pointed out, in the LAD22 talk about del_ost it was mentioned that old indices have in this special case not been reused. 

For my previous experiences, when permanently removing an OST, without
other procedures such as "lctl del_ost" or "lctl llog_cancel", running

    lctl get_param osc.*.ost_conn_uuid

in MGS/MDT (our MGS and MDT are setting in the same device), we still
saw the index of the removed OST existed. Thus reusing this index for new
OST device caused problem. But after running "tunefs.lustre --writeconf",
that index was completely removed from MGS/MDT. In this case, we have
tested that this index can be reused in new OST device.

> I think there was a mail on the mailing list a few month ago where someone asked if gaps in the OST indices are a problem. I haven't found this mail again, but I think that Andreas Dilger answered that gaps are not a problem but untested. Do I remember that correctly? Could someone comment on that question?
> 
> Best regards,
> Robert

>From our previous tests, the index gaps among the OSTs did not cause
problem. The index could be used for a new OST device only if it does
not show in MGS/MDT by checking:

    lctl get_param osc.*.ost_conn_uuid

no matter whether it is a completely new assigned number, or an used
number by already removed OST.


Best Regards,

T.H.Hsieh


> > Am 26.10.2022 um 11:15 schrieb Etienne Aujames <eaujames at ddn.com>:
> > 
> > Hi,
> > 
> > "mkfs.lustre --replace", is used to replace an existing OST in MGS
> > configurations (CONFIGS/*-{client,MDT*}). It will read the existing
> > configuration on the MGS for the given index, copy it locally. Then it
> > will negotiate LAST_IDs (last object id for each sequence) with MDTs
> > (the OST should update the last object ids with those registered on the
> > MDTs to avoid overlaps with existing objects).
> > 
> > In your case, if you follow the procedure to permanently remove an OST
> > via llog_cancel or "lctl del_ost", you should not have any trace of the
> > old OST in your configuration (like it never existed). So you should
> > not use "mkfs.lustre --replace".
> > 
> > With the LU-15000, the local copy of MDT configuration is not
> > (correctly) updated with the MGS one. This is because you canceled
> > indexes on the configuration and those canceled records were not copied
> > on the local one.
> > This mess up llog indexes between MGS and the local MDT copies.
> > 
> > When you add an OST, the MDT configurations on MGS are updated (new
> > record added to declare new osp and new connections for the OST).
> > Then MDTs try to read only new indexes in the MGS configuration but the
> > last llog indexes between the two configurations are not the same
> > anymore: the MDT tries to read and apply older MGS's record.
> > 
> > So you have to apply the patch on every server.
> > 
> > Etienne 
> > 
> > On Wed, 2022-10-26 at 05:40 +0000, Redl, Robert wrote:
> >> Dear Etienne, 
> >> 
> >> thanks a lot! We do actually not have MDS crashes as described in LU-
> >> 15000, but we do of course have several index gaps caused by
> >> llog_cancel. 
> >> 
> >> Is it necessary to have this patch on all servers, or is only the MGS
> >> affected? 
> >> 
> >> About mkfs.lustre --replace: why is the --replace required if all
> >> traces of the old OST have been removed from the config log? Are
> >> indices that have been used before stored somewhere else? 
> >> 
> >> Best regards,
> >> Robert
> >> 
> >>> Am 25.10.2022 um 14:15 schrieb Etienne Aujames <
> >>> eaujames at ddn.com
> >>>> :
> >>> 
> >>> Hello,
> >>> 
> >>> I think you hit the following bug:
> >>> https://jira.whamcloud.com/browse/LU-15000
> >>> MDS crashes with
> >>> (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects ==
> >>> 1 )
> >>> failed
> >>> 
> >>> Stephane Thiell reported this issue and fixed it by patching his
> >>> 2.12.7
> >>> version with 
> >>> https://review.whamcloud.com/46552
> >>> (2.15 backport:  
> >>> https://review.whamcloud.com/47515
> >>> ).
> >>> 
> >>> A backport is issued for b2_15 branch but not yet landed: 
> >>> https://review.whamcloud.com/c/fs/lustre-release/+/48898
> >>> 
> >>> 
> >>> You could also check his LAD's presentation about removing OSTs
> >>> (lctl
> >>> del_ost):
> >>> "A filesystem coming of age: live hardware upgrade practices at
> >>> Stanford Research Computing" (
> >>> https://www.eofs.eu/_media/events/lad22/2.5-stanfordrc_s_thiell.pdf
> >>> )
> >>> 
> >>> Etienne AUJAMES
> >>> 
> >>> On Tue, 2022-10-25 at 10:12 +0000, Redl, Robert wrote:
> >>>> Dear Lustre Experts,
> >>>> 
> >>>> some time ago we removed an OST. We followed the instructions
> >>>> from
> >>>> the documentation (
> >>>> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> >>>> 
> >>>> ) including cleaning up the logs from all related entries using
> >>>> llog_cancel. After the removal the system worked normal. 
> >>>> 
> >>>> Now we are trying to add a new OST reusing the same index. If the
> >>>> OST
> >>>> is created with mkfs.lustre --replace, then it is possible to
> >>>> mount
> >>>> the OST, but it is not possible to mount the whole filesystem
> >>>> anymore. A client would see the following error message:
> >>>> 
> >>>> kernel: LustreError:
> >>>> 70451:0:(obd_config.c:1499:class_process_config()) no device for:
> >>>> project-OST0007-osc-ffff914108c2e800
> >>>> kernel: LustreError:
> >>>> 70451:0:(obd_config.c:2001:class_config_llog_handler()) 
> >>>> MGC10.163.52.14 at tcp: cfg command failed: rc = -22
> >>>> kernel: Lustre:    cmd=cf00b 0:project-OST0007-osc  1:
> >>>> 10.163.52.20 at tcp
> >>>> kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue())
> >>>> failed
> >>>> processing log: -22
> >>>> 
> >>>> In order to make the filesystem mountable again, all log entries
> >>>> created by mounting the OST must be removed using llog_cancel.
> >>>> 
> >>>> If the OST is created using mkfs.lustre without --replace, then
> >>>> the
> >>>> OST itself is not mountable. The following error message is
> >>>> shown:
> >>>> 
> >>>> kernel: LustreError: 140-5: Server project-OST0007 requested
> >>>> index 7,
> >>>> but that index is already in use. Use --writeconf to force
> >>>> kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg())
> >>>> Failed to write project-OST0007 log (-98)
> >>>> 
> >>>> Given that the --writeconf suggested in the error message
> >>>> requires a
> >>>> full shutdown of the system, we would like to avoid that.
> >>>> 
> >>>> I wonder if we maybe overlooked something when the OST was
> >>>> removed.
> >>>> The logs for project-client, project-MDT0000, and project-MDT0001 
> >>>> are
> >>>> not showing any traces of the old OST anymore. Is there anything
> >>>> more
> >>>> that needs to be done to make lustre forget that an OST with a
> >>>> given
> >>>> index existed at some point?
> >>>> 
> >>>> Lustre Version: 2.15.1, ZFS-backend.
> >>>> 
> >>>> Thanks a lot!
> >>>> Robert
> >>>> 
> >>>> _______________________________________________
> >>>> lustre-discuss mailing list
> >>>> lustre-discuss at lists.lustre.org
> >>>> 
> >>>> 
> >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >>>> 
> >>>> 
> >>>> 
> >> 
> >> 
> 



> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list