[lustre-discuss] Correct procedure for OST replacment

Tung-Han Hsieh thhsieh at twcp1.phys.ntu.edu.tw
Wed Oct 26 07:11:48 PDT 2022


Hello,

Just an experience to share. If we follow the correct procedure
to permanently remove an OST, the index of that OST still exists
in MDT. The only way to remove that OST index from MDT is to run
"tunefs.lustre --writeconf" to MDT (and also to all OSTs). That
needs temporarily shutdown the lustre file system.

Therefore, if we want to remove an OST, and replace it by a new
storage device, we have to assign it with a new OST index. The
"removed" OST indices will be reused only when we have the chance
to temporarily shutdown the lustre file system, and run
"tunefs.lustre --writeconf" to clear and regenerate all the lustre
internal logs.

Best Regards,

T.H.hsieh

On Wed, Oct 26, 2022 at 09:15:00AM +0000, Etienne Aujames via lustre-discuss wrote:
> Hi,
> 
> "mkfs.lustre --replace", is used to replace an existing OST in MGS
> configurations (CONFIGS/*-{client,MDT*}). It will read the existing
> configuration on the MGS for the given index, copy it locally. Then it
> will negotiate LAST_IDs (last object id for each sequence) with MDTs
> (the OST should update the last object ids with those registered on the
> MDTs to avoid overlaps with existing objects).
> 
> In your case, if you follow the procedure to permanently remove an OST
> via llog_cancel or "lctl del_ost", you should not have any trace of the
> old OST in your configuration (like it never existed). So you should
> not use "mkfs.lustre --replace".
> 
> With the LU-15000, the local copy of MDT configuration is not
> (correctly) updated with the MGS one. This is because you canceled
> indexes on the configuration and those canceled records were not copied
> on the local one.
> This mess up llog indexes between MGS and the local MDT copies.
> 
> When you add an OST, the MDT configurations on MGS are updated (new
> record added to declare new osp and new connections for the OST).
> Then MDTs try to read only new indexes in the MGS configuration but the
> last llog indexes between the two configurations are not the same
> anymore: the MDT tries to read and apply older MGS's record.
> 
> So you have to apply the patch on every server.
> 
> Etienne 
> 
> On Wed, 2022-10-26 at 05:40 +0000, Redl, Robert wrote:
> > Dear Etienne, 
> > 
> > thanks a lot! We do actually not have MDS crashes as described in LU-
> > 15000, but we do of course have several index gaps caused by
> > llog_cancel. 
> > 
> > Is it necessary to have this patch on all servers, or is only the MGS
> > affected? 
> > 
> > About mkfs.lustre --replace: why is the --replace required if all
> > traces of the old OST have been removed from the config log? Are
> > indices that have been used before stored somewhere else? 
> > 
> > Best regards,
> > Robert
> > 
> > > Am 25.10.2022 um 14:15 schrieb Etienne Aujames <
> > > eaujames at ddn.com
> > > >:
> > > 
> > > Hello,
> > > 
> > > I think you hit the following bug:
> > > https://jira.whamcloud.com/browse/LU-15000
> > >  MDS crashes with
> > > (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects ==
> > > 1 )
> > > failed
> > > 
> > > Stephane Thiell reported this issue and fixed it by patching his
> > > 2.12.7
> > > version with 
> > > https://review.whamcloud.com/46552
> > >  (2.15 backport:  
> > > https://review.whamcloud.com/47515
> > > ).
> > > 
> > > A backport is issued for b2_15 branch but not yet landed: 
> > > https://review.whamcloud.com/c/fs/lustre-release/+/48898
> > > 
> > > 
> > > You could also check his LAD's presentation about removing OSTs
> > > (lctl
> > > del_ost):
> > > "A filesystem coming of age: live hardware upgrade practices at
> > > Stanford Research Computing" (
> > > https://www.eofs.eu/_media/events/lad22/2.5-stanfordrc_s_thiell.pdf
> > > )
> > > 
> > > Etienne AUJAMES
> > > 
> > > On Tue, 2022-10-25 at 10:12 +0000, Redl, Robert wrote:
> > > > Dear Lustre Experts,
> > > > 
> > > > some time ago we removed an OST. We followed the instructions
> > > > from
> > > > the documentation (
> > > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > > 
> > > > ) including cleaning up the logs from all related entries using
> > > > llog_cancel. After the removal the system worked normal. 
> > > > 
> > > > Now we are trying to add a new OST reusing the same index. If the
> > > > OST
> > > > is created with mkfs.lustre --replace, then it is possible to
> > > > mount
> > > > the OST, but it is not possible to mount the whole filesystem
> > > > anymore. A client would see the following error message:
> > > > 
> > > > kernel: LustreError:
> > > > 70451:0:(obd_config.c:1499:class_process_config()) no device for:
> > > > project-OST0007-osc-ffff914108c2e800
> > > > kernel: LustreError:
> > > > 70451:0:(obd_config.c:2001:class_config_llog_handler()) 
> > > > MGC10.163.52.14 at tcp: cfg command failed: rc = -22
> > > > kernel: Lustre:    cmd=cf00b 0:project-OST0007-osc  1:
> > > > 10.163.52.20 at tcp
> > > > kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue())
> > > > failed
> > > > processing log: -22
> > > > 
> > > > In order to make the filesystem mountable again, all log entries
> > > > created by mounting the OST must be removed using llog_cancel.
> > > > 
> > > > If the OST is created using mkfs.lustre without --replace, then
> > > > the
> > > > OST itself is not mountable. The following error message is
> > > > shown:
> > > > 
> > > > kernel: LustreError: 140-5: Server project-OST0007 requested
> > > > index 7,
> > > > but that index is already in use. Use --writeconf to force
> > > > kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg())
> > > > Failed to write project-OST0007 log (-98)
> > > > 
> > > > Given that the --writeconf suggested in the error message
> > > > requires a
> > > > full shutdown of the system, we would like to avoid that.
> > > > 
> > > > I wonder if we maybe overlooked something when the OST was
> > > > removed.
> > > > The logs for project-client, project-MDT0000, and project-MDT0001 
> > > > are
> > > > not showing any traces of the old OST anymore. Is there anything
> > > > more
> > > > that needs to be done to make lustre forget that an OST with a
> > > > given
> > > > index existed at some point?
> > > > 
> > > > Lustre Version: 2.15.1, ZFS-backend.
> > > > 
> > > > Thanks a lot!
> > > > Robert
> > > > 
> > > > _______________________________________________
> > > > lustre-discuss mailing list
> > > > lustre-discuss at lists.lustre.org
> > > > 
> > > > 
> > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> > > > 
> > > > 
> > > > 
> > 
> > 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list