[lustre-discuss] Correct procedure for OST replacment

Redl, Robert Robert.Redl at lmu.de
Wed Oct 26 07:37:12 PDT 2022


Dear Etienne, 

thanks a lot for the detailed explanation! I will try out the patch at the next opportunity.

@Tung-Han Hsieh: I think the issue that indices of old OST remain until --writeconf is used is solved by the new command lctl del_ost or the older lctl llog_cancel. Both are removing entries from the configuration log.

But I would also be interested if someone could comment on wether or not it is a good idea to reuse old indices of removed OSTs. We did that meanwhile a few times, but as Thomas Roth pointed out, in the LAD22 talk about del_ost it was mentioned that old indices have in this special case not been reused. 

I think there was a mail on the mailing list a few month ago where someone asked if gaps in the OST indices are a problem. I haven't found this mail again, but I think that Andreas Dilger answered that gaps are not a problem but untested. Do I remember that correctly? Could someone comment on that question?

Best regards,
Robert


> Am 26.10.2022 um 11:15 schrieb Etienne Aujames <eaujames at ddn.com>:
> 
> Hi,
> 
> "mkfs.lustre --replace", is used to replace an existing OST in MGS
> configurations (CONFIGS/*-{client,MDT*}). It will read the existing
> configuration on the MGS for the given index, copy it locally. Then it
> will negotiate LAST_IDs (last object id for each sequence) with MDTs
> (the OST should update the last object ids with those registered on the
> MDTs to avoid overlaps with existing objects).
> 
> In your case, if you follow the procedure to permanently remove an OST
> via llog_cancel or "lctl del_ost", you should not have any trace of the
> old OST in your configuration (like it never existed). So you should
> not use "mkfs.lustre --replace".
> 
> With the LU-15000, the local copy of MDT configuration is not
> (correctly) updated with the MGS one. This is because you canceled
> indexes on the configuration and those canceled records were not copied
> on the local one.
> This mess up llog indexes between MGS and the local MDT copies.
> 
> When you add an OST, the MDT configurations on MGS are updated (new
> record added to declare new osp and new connections for the OST).
> Then MDTs try to read only new indexes in the MGS configuration but the
> last llog indexes between the two configurations are not the same
> anymore: the MDT tries to read and apply older MGS's record.
> 
> So you have to apply the patch on every server.
> 
> Etienne 
> 
> On Wed, 2022-10-26 at 05:40 +0000, Redl, Robert wrote:
>> Dear Etienne, 
>> 
>> thanks a lot! We do actually not have MDS crashes as described in LU-
>> 15000, but we do of course have several index gaps caused by
>> llog_cancel. 
>> 
>> Is it necessary to have this patch on all servers, or is only the MGS
>> affected? 
>> 
>> About mkfs.lustre --replace: why is the --replace required if all
>> traces of the old OST have been removed from the config log? Are
>> indices that have been used before stored somewhere else? 
>> 
>> Best regards,
>> Robert
>> 
>>> Am 25.10.2022 um 14:15 schrieb Etienne Aujames <
>>> eaujames at ddn.com
>>>> :
>>> 
>>> Hello,
>>> 
>>> I think you hit the following bug:
>>> https://jira.whamcloud.com/browse/LU-15000
>>> MDS crashes with
>>> (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects ==
>>> 1 )
>>> failed
>>> 
>>> Stephane Thiell reported this issue and fixed it by patching his
>>> 2.12.7
>>> version with 
>>> https://review.whamcloud.com/46552
>>> (2.15 backport:  
>>> https://review.whamcloud.com/47515
>>> ).
>>> 
>>> A backport is issued for b2_15 branch but not yet landed: 
>>> https://review.whamcloud.com/c/fs/lustre-release/+/48898
>>> 
>>> 
>>> You could also check his LAD's presentation about removing OSTs
>>> (lctl
>>> del_ost):
>>> "A filesystem coming of age: live hardware upgrade practices at
>>> Stanford Research Computing" (
>>> https://www.eofs.eu/_media/events/lad22/2.5-stanfordrc_s_thiell.pdf
>>> )
>>> 
>>> Etienne AUJAMES
>>> 
>>> On Tue, 2022-10-25 at 10:12 +0000, Redl, Robert wrote:
>>>> Dear Lustre Experts,
>>>> 
>>>> some time ago we removed an OST. We followed the instructions
>>>> from
>>>> the documentation (
>>>> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
>>>> 
>>>> ) including cleaning up the logs from all related entries using
>>>> llog_cancel. After the removal the system worked normal. 
>>>> 
>>>> Now we are trying to add a new OST reusing the same index. If the
>>>> OST
>>>> is created with mkfs.lustre --replace, then it is possible to
>>>> mount
>>>> the OST, but it is not possible to mount the whole filesystem
>>>> anymore. A client would see the following error message:
>>>> 
>>>> kernel: LustreError:
>>>> 70451:0:(obd_config.c:1499:class_process_config()) no device for:
>>>> project-OST0007-osc-ffff914108c2e800
>>>> kernel: LustreError:
>>>> 70451:0:(obd_config.c:2001:class_config_llog_handler()) 
>>>> MGC10.163.52.14 at tcp: cfg command failed: rc = -22
>>>> kernel: Lustre:    cmd=cf00b 0:project-OST0007-osc  1:
>>>> 10.163.52.20 at tcp
>>>> kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue())
>>>> failed
>>>> processing log: -22
>>>> 
>>>> In order to make the filesystem mountable again, all log entries
>>>> created by mounting the OST must be removed using llog_cancel.
>>>> 
>>>> If the OST is created using mkfs.lustre without --replace, then
>>>> the
>>>> OST itself is not mountable. The following error message is
>>>> shown:
>>>> 
>>>> kernel: LustreError: 140-5: Server project-OST0007 requested
>>>> index 7,
>>>> but that index is already in use. Use --writeconf to force
>>>> kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg())
>>>> Failed to write project-OST0007 log (-98)
>>>> 
>>>> Given that the --writeconf suggested in the error message
>>>> requires a
>>>> full shutdown of the system, we would like to avoid that.
>>>> 
>>>> I wonder if we maybe overlooked something when the OST was
>>>> removed.
>>>> The logs for project-client, project-MDT0000, and project-MDT0001 
>>>> are
>>>> not showing any traces of the old OST anymore. Is there anything
>>>> more
>>>> that needs to be done to make lustre forget that an OST with a
>>>> given
>>>> index existed at some point?
>>>> 
>>>> Lustre Version: 2.15.1, ZFS-backend.
>>>> 
>>>> Thanks a lot!
>>>> Robert
>>>> 
>>>> _______________________________________________
>>>> lustre-discuss mailing list
>>>> lustre-discuss at lists.lustre.org
>>>> 
>>>> 
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>> 
>>>> 
>>>> 
>> 
>> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4179 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20221026/1b34fa4c/attachment.bin>


More information about the lustre-discuss mailing list