[lustre-discuss] Correct procedure for OST replacment
Redl, Robert
Robert.Redl at lmu.de
Wed Oct 26 07:37:12 PDT 2022
Dear Etienne,
thanks a lot for the detailed explanation! I will try out the patch at the next opportunity.
@Tung-Han Hsieh: I think the issue that indices of old OST remain until --writeconf is used is solved by the new command lctl del_ost or the older lctl llog_cancel. Both are removing entries from the configuration log.
But I would also be interested if someone could comment on wether or not it is a good idea to reuse old indices of removed OSTs. We did that meanwhile a few times, but as Thomas Roth pointed out, in the LAD22 talk about del_ost it was mentioned that old indices have in this special case not been reused.
I think there was a mail on the mailing list a few month ago where someone asked if gaps in the OST indices are a problem. I haven't found this mail again, but I think that Andreas Dilger answered that gaps are not a problem but untested. Do I remember that correctly? Could someone comment on that question?
Best regards,
Robert
> Am 26.10.2022 um 11:15 schrieb Etienne Aujames <eaujames at ddn.com>:
>
> Hi,
>
> "mkfs.lustre --replace", is used to replace an existing OST in MGS
> configurations (CONFIGS/*-{client,MDT*}). It will read the existing
> configuration on the MGS for the given index, copy it locally. Then it
> will negotiate LAST_IDs (last object id for each sequence) with MDTs
> (the OST should update the last object ids with those registered on the
> MDTs to avoid overlaps with existing objects).
>
> In your case, if you follow the procedure to permanently remove an OST
> via llog_cancel or "lctl del_ost", you should not have any trace of the
> old OST in your configuration (like it never existed). So you should
> not use "mkfs.lustre --replace".
>
> With the LU-15000, the local copy of MDT configuration is not
> (correctly) updated with the MGS one. This is because you canceled
> indexes on the configuration and those canceled records were not copied
> on the local one.
> This mess up llog indexes between MGS and the local MDT copies.
>
> When you add an OST, the MDT configurations on MGS are updated (new
> record added to declare new osp and new connections for the OST).
> Then MDTs try to read only new indexes in the MGS configuration but the
> last llog indexes between the two configurations are not the same
> anymore: the MDT tries to read and apply older MGS's record.
>
> So you have to apply the patch on every server.
>
> Etienne
>
> On Wed, 2022-10-26 at 05:40 +0000, Redl, Robert wrote:
>> Dear Etienne,
>>
>> thanks a lot! We do actually not have MDS crashes as described in LU-
>> 15000, but we do of course have several index gaps caused by
>> llog_cancel.
>>
>> Is it necessary to have this patch on all servers, or is only the MGS
>> affected?
>>
>> About mkfs.lustre --replace: why is the --replace required if all
>> traces of the old OST have been removed from the config log? Are
>> indices that have been used before stored somewhere else?
>>
>> Best regards,
>> Robert
>>
>>> Am 25.10.2022 um 14:15 schrieb Etienne Aujames <
>>> eaujames at ddn.com
>>>> :
>>>
>>> Hello,
>>>
>>> I think you hit the following bug:
>>> https://jira.whamcloud.com/browse/LU-15000
>>> MDS crashes with
>>> (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects ==
>>> 1 )
>>> failed
>>>
>>> Stephane Thiell reported this issue and fixed it by patching his
>>> 2.12.7
>>> version with
>>> https://review.whamcloud.com/46552
>>> (2.15 backport:
>>> https://review.whamcloud.com/47515
>>> ).
>>>
>>> A backport is issued for b2_15 branch but not yet landed:
>>> https://review.whamcloud.com/c/fs/lustre-release/+/48898
>>>
>>>
>>> You could also check his LAD's presentation about removing OSTs
>>> (lctl
>>> del_ost):
>>> "A filesystem coming of age: live hardware upgrade practices at
>>> Stanford Research Computing" (
>>> https://www.eofs.eu/_media/events/lad22/2.5-stanfordrc_s_thiell.pdf
>>> )
>>>
>>> Etienne AUJAMES
>>>
>>> On Tue, 2022-10-25 at 10:12 +0000, Redl, Robert wrote:
>>>> Dear Lustre Experts,
>>>>
>>>> some time ago we removed an OST. We followed the instructions
>>>> from
>>>> the documentation (
>>>> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
>>>>
>>>> ) including cleaning up the logs from all related entries using
>>>> llog_cancel. After the removal the system worked normal.
>>>>
>>>> Now we are trying to add a new OST reusing the same index. If the
>>>> OST
>>>> is created with mkfs.lustre --replace, then it is possible to
>>>> mount
>>>> the OST, but it is not possible to mount the whole filesystem
>>>> anymore. A client would see the following error message:
>>>>
>>>> kernel: LustreError:
>>>> 70451:0:(obd_config.c:1499:class_process_config()) no device for:
>>>> project-OST0007-osc-ffff914108c2e800
>>>> kernel: LustreError:
>>>> 70451:0:(obd_config.c:2001:class_config_llog_handler())
>>>> MGC10.163.52.14 at tcp: cfg command failed: rc = -22
>>>> kernel: Lustre: cmd=cf00b 0:project-OST0007-osc 1:
>>>> 10.163.52.20 at tcp
>>>> kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue())
>>>> failed
>>>> processing log: -22
>>>>
>>>> In order to make the filesystem mountable again, all log entries
>>>> created by mounting the OST must be removed using llog_cancel.
>>>>
>>>> If the OST is created using mkfs.lustre without --replace, then
>>>> the
>>>> OST itself is not mountable. The following error message is
>>>> shown:
>>>>
>>>> kernel: LustreError: 140-5: Server project-OST0007 requested
>>>> index 7,
>>>> but that index is already in use. Use --writeconf to force
>>>> kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg())
>>>> Failed to write project-OST0007 log (-98)
>>>>
>>>> Given that the --writeconf suggested in the error message
>>>> requires a
>>>> full shutdown of the system, we would like to avoid that.
>>>>
>>>> I wonder if we maybe overlooked something when the OST was
>>>> removed.
>>>> The logs for project-client, project-MDT0000, and project-MDT0001
>>>> are
>>>> not showing any traces of the old OST anymore. Is there anything
>>>> more
>>>> that needs to be done to make lustre forget that an OST with a
>>>> given
>>>> index existed at some point?
>>>>
>>>> Lustre Version: 2.15.1, ZFS-backend.
>>>>
>>>> Thanks a lot!
>>>> Robert
>>>>
>>>> _______________________________________________
>>>> lustre-discuss mailing list
>>>> lustre-discuss at lists.lustre.org
>>>>
>>>>
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>
>>>>
>>>>
>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4179 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20221026/1b34fa4c/attachment.bin>
More information about the lustre-discuss
mailing list