[lustre-discuss] Correct procedure for OST replacment

Redl, Robert Robert.Redl at lmu.de
Tue Oct 25 22:40:10 PDT 2022


Dear Etienne, 

thanks a lot! We do actually not have MDS crashes as described in LU-15000, but we do of course have several index gaps caused by llog_cancel. 

Is it necessary to have this patch on all servers, or is only the MGS affected? 

About mkfs.lustre --replace: why is the --replace required if all traces of the old OST have been removed from the config log? Are indices that have been used before stored somewhere else? 

Best regards,
Robert

> Am 25.10.2022 um 14:15 schrieb Etienne Aujames <eaujames at ddn.com>:
> 
> Hello,
> 
> I think you hit the following bug:
> https://jira.whamcloud.com/browse/LU-15000 MDS crashes with
> (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 )
> failed
> 
> Stephane Thiell reported this issue and fixed it by patching his 2.12.7
> version with https://review.whamcloud.com/46552 (2.15 backport:  
> https://review.whamcloud.com/47515).
> 
> A backport is issued for b2_15 branch but not yet landed: 
> https://review.whamcloud.com/c/fs/lustre-release/+/48898
> 
> You could also check his LAD's presentation about removing OSTs (lctl
> del_ost):
> "A filesystem coming of age: live hardware upgrade practices at
> Stanford Research Computing" (
> https://www.eofs.eu/_media/events/lad22/2.5-stanfordrc_s_thiell.pdf)
> 
> Etienne AUJAMES
> 
> On Tue, 2022-10-25 at 10:12 +0000, Redl, Robert wrote:
>> Dear Lustre Experts,
>> 
>> some time ago we removed an OST. We followed the instructions from
>> the documentation (
>> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
>> ) including cleaning up the logs from all related entries using
>> llog_cancel. After the removal the system worked normal. 
>> 
>> Now we are trying to add a new OST reusing the same index. If the OST
>> is created with mkfs.lustre --replace, then it is possible to mount
>> the OST, but it is not possible to mount the whole filesystem
>> anymore. A client would see the following error message:
>> 
>> kernel: LustreError:
>> 70451:0:(obd_config.c:1499:class_process_config()) no device for:
>> project-OST0007-osc-ffff914108c2e800
>> kernel: LustreError:
>> 70451:0:(obd_config.c:2001:class_config_llog_handler()) 
>> MGC10.163.52.14 at tcp: cfg command failed: rc = -22
>> kernel: Lustre:    cmd=cf00b 0:project-OST0007-osc  1:
>> 10.163.52.20 at tcp
>> kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue()) failed
>> processing log: -22
>> 
>> In order to make the filesystem mountable again, all log entries
>> created by mounting the OST must be removed using llog_cancel.
>> 
>> If the OST is created using mkfs.lustre without --replace, then the
>> OST itself is not mountable. The following error message is shown:
>> 
>> kernel: LustreError: 140-5: Server project-OST0007 requested index 7,
>> but that index is already in use. Use --writeconf to force
>> kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg())
>> Failed to write project-OST0007 log (-98)
>> 
>> Given that the --writeconf suggested in the error message requires a
>> full shutdown of the system, we would like to avoid that.
>> 
>> I wonder if we maybe overlooked something when the OST was removed.
>> The logs for project-client, project-MDT0000, and project-MDT0001 are
>> not showing any traces of the old OST anymore. Is there anything more
>> that needs to be done to make lustre forget that an OST with a given
>> index existed at some point?
>> 
>> Lustre Version: 2.15.1, ZFS-backend.
>> 
>> Thanks a lot!
>> Robert
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> 
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4179 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20221026/eea67358/attachment.bin>


More information about the lustre-discuss mailing list