[lustre-discuss] Correct procedure for OST replacment

Redl, Robert Robert.Redl at lmu.de
Tue Oct 25 03:12:00 PDT 2022


Dear Lustre Experts,

some time ago we removed an OST. We followed the instructions from the documentation (https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost) including cleaning up the logs from all related entries using llog_cancel. After the removal the system worked normal. 

Now we are trying to add a new OST reusing the same index. If the OST is created with mkfs.lustre --replace, then it is possible to mount the OST, but it is not possible to mount the whole filesystem anymore. A client would see the following error message:

kernel: LustreError: 70451:0:(obd_config.c:1499:class_process_config()) no device for: project-OST0007-osc-ffff914108c2e800
kernel: LustreError: 70451:0:(obd_config.c:2001:class_config_llog_handler()) MGC10.163.52.14 at tcp: cfg command failed: rc = -22
kernel: Lustre:    cmd=cf00b 0:project-OST0007-osc  1:10.163.52.20 at tcp
kernel: LustreError: 1760:0:(mgc_request.c:612:do_requeue()) failed processing log: -22

In order to make the filesystem mountable again, all log entries created by mounting the OST must be removed using llog_cancel.

If the OST is created using mkfs.lustre without --replace, then the OST itself is not mountable. The following error message is shown:

kernel: LustreError: 140-5: Server project-OST0007 requested index 7, but that index is already in use. Use --writeconf to force
kernel: LustreError: 7302:0:(mgs_handler.c:503:mgs_target_reg()) Failed to write project-OST0007 log (-98)

Given that the --writeconf suggested in the error message requires a full shutdown of the system, we would like to avoid that.

I wonder if we maybe overlooked something when the OST was removed. The logs for project-client, project-MDT0000, and project-MDT0001 are not showing any traces of the old OST anymore. Is there anything more that needs to be done to make lustre forget that an OST with a given index existed at some point?

Lustre Version: 2.15.1, ZFS-backend.

Thanks a lot!
Robert

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4179 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20221025/f5b05bc6/attachment.bin>


More information about the lustre-discuss mailing list