[lustre-discuss] New client mounts fail after deactivating OSTs

Brad Merchant bmerchant at cambridgecomputer.com
Tue Jul 11 14:36:26 PDT 2023


We recreated the issue in a test cluster and it was definitely the
llog_cancel steps that caused the issue. Clients couldn't process the llog
properly on new mounts and would fail. We had to completely clear the
llog and --writeconf every target to regenerate it from scratch.

The cluster is up and running now but I would certainly recommend at least
revising that section of the manual.

On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant <
bmerchant at cambridgecomputer.com> wrote:

> We deactivated half of 32 OSTs after draining them. We followed the steps
> in section 14.9.3 of the lustre manual
>
> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
>
> After running the steps in subhead "3. Deactivate the OST." on
> OST0010-OST001f, new client mounts fail with the below log messages.
> Existing client mounts seem to function correctly but are on a bit of a
> ticking timebomb because they are configured with autofs.
>
> The llog_cancel steps are new to me and the issues seemed to appear after
> those commands were issued (can't say that 100% definitively however).
> Servers are running 2.12.5 and clients are on 2.14.x
>
>
> Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> 26814:0:(obd_config.c:1514:class_process_config()) no device for:
> hydra-OST0010-osc-ffff8be5340c2000
> Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> 26814:0:(obd_config.c:2038:class_config_llog_handler())
> MGC172.16.100.101 at o2ib: cfg command failed: rc = -22
> Jul 10 15:22:40 adm-sup1 kernel: Lustre:    cmd=cf00f 0:hydra-OST0010-osc
>  1:osc.active=0
> Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f: MGC172.16.100.101 at o2ib:
> Configuration from log hydra-client failed from MGS -22. Check client and
> MGS are on compatible version.
> Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to 99:99
> Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl
> set_param 'llite.hydra-ffff8be5340c2000.nosquash_nids=192.168.80.84 at tcp
> 192.168.80.122 at tcp 192.168.80.21 at tcp 172.16.90.11 at o2ib 172.16.100.211 at o2ib
> 172.16.100.212 at o2ib 172.16.100.213 at o2ib 172.16.100.214 at o2ib
> 172.16.100.215 at o2ib 172.16.90.51 at o2ib'' failed with exit code 2.
> Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
> Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> 26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230711/5b831461/attachment.htm>


More information about the lustre-discuss mailing list