[lustre-discuss] New client mounts fail after deactivating OSTs

Andreas Dilger adilger at whamcloud.com
Tue Jul 18 16:55:23 PDT 2023


Brian,
Please file a ticket in LUDOC with details of how the manual should be updated. Ideally, including a patch. :-)

Cheers, Andreas

On Jul 11, 2023, at 15:39, Brad Merchant <bmerchant at cambridgecomputer.com> wrote:


We recreated the issue in a test cluster and it was definitely the llog_cancel steps that caused the issue. Clients couldn't process the llog properly on new mounts and would fail. We had to completely clear the llog and --writeconf every target to regenerate it from scratch.

The cluster is up and running now but I would certainly recommend at least revising that section of the manual.

On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant <bmerchant at cambridgecomputer.com<mailto:bmerchant at cambridgecomputer.com>> wrote:
We deactivated half of 32 OSTs after draining them. We followed the steps in section 14.9.3 of the lustre manual

https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost

After running the steps in subhead "3. Deactivate the OST." on OST0010-OST001f, new client mounts fail with the below log messages. Existing client mounts seem to function correctly but are on a bit of a ticking timebomb because they are configured with autofs.

The llog_cancel steps are new to me and the issues seemed to appear after those commands were issued (can't say that 100% definitively however). Servers are running 2.12.5 and clients are on 2.14.x


Jul 10 15:22:40 adm-sup1 kernel: LustreError: 26814:0:(obd_config.c:1514:class_process_config()) no device for: hydra-OST0010-osc-ffff8be5340c2000
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 26814:0:(obd_config.c:2038:class_config_llog_handler()) MGC172.16.100.101 at o2ib: cfg command failed: rc = -22
Jul 10 15:22:40 adm-sup1 kernel: Lustre:    cmd=cf00f 0:hydra-OST0010-osc  1:osc.active=0
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f: MGC172.16.100.101 at o2ib: Configuration from log hydra-client failed from MGS -22. Check client and MGS are on compatible version.
Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to 99:99
Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl set_param 'llite.hydra-ffff8be5340c2000.nosquash_nids=192.168.80.84 at tcp 192.168.80.122 at tcp 192.168.80.21 at tcp 172.16.90.11 at o2ib 172.16.100.211 at o2ib 172.16.100.212 at o2ib 172.16.100.213 at o2ib 172.16.100.214 at o2ib 172.16.100.215 at o2ib 172.16.90.51 at o2ib'' failed with exit code 2.
Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
Jul 10 15:22:40 adm-sup1 kernel: LustreError: 26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)



_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230718/9a756fc3/attachment.htm>


More information about the lustre-discuss mailing list