[lustre-discuss] New client mounts fail after deactivating OSTs

Nehring, Shane R [ITS] snehring at iastate.edu
Fri Nov 17 11:55:16 PST 2023


Little late to the party here, but I just ran into this myself.

I worked around it without having to regenerate everything with --writeconf,
which I realize isn't helpful 4 months after the fact, but I figured I'd post
here to help anyone else who runs into this issue in the future.

In my case I had removed all the llog entries for the OSTs except the conf_param
entries setting osc.active=0, assuming for whatever reason those should be
retained. This is incorrect, you'll want to remove those too for each relevant
OST.

I've opened an issue in LUDOC with some suggestions about how phrasing might be
improved.


On Tue, 2023-07-18 at 23:55 +0000, Andreas Dilger via lustre-discuss wrote:
> Brian,
> Please file a ticket in LUDOC with details of how the manual should be
> updated. Ideally, including a patch. :-)
> 
> Cheers, Andreas
> 
> > On Jul 11, 2023, at 15:39, Brad Merchant <bmerchant at cambridgecomputer.com>
> > wrote:
> > 
> > 
> > We recreated the issue in a test cluster and it was definitely the
> > llog_cancel steps that caused the issue. Clients couldn't process the llog
> > properly on new mounts and would fail. We had to completely clear the
> > llog and --writeconf every target to regenerate it from scratch.
> > 
> > The cluster is up and running now but I would certainly recommend at least
> > revising that section of the manual.
> > 
> > 
> > 
> > On Mon, Jul 10, 2023 at 5:22 PM Brad Merchant
> > <bmerchant at cambridgecomputer.com> wrote:
> > > We deactivated half of 32 OSTs after draining them. We followed the steps
> > > in section 14.9.3 of the lustre manual
> > > 
> > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > 
> > > After running the steps in subhead "3. Deactivate the OST." on OST0010-
> > > OST001f, new client mounts fail with the below log messages. Existing
> > > client mounts seem to function correctly but are on a bit of a ticking
> > > timebomb because they are configured with autofs.
> > > 
> > > The llog_cancel steps are new to me and the issues seemed to appear after
> > > those commands were issued (can't say that 100% definitively however).
> > > Servers are running 2.12.5 and clients are on 2.14.x
> > > 
> > > 
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> > > 26814:0:(obd_config.c:1514:class_process_config()) no device for: hydra-
> > > OST0010-osc-ffff8be5340c2000
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> > > 26814:0:(obd_config.c:2038:class_config_llog_handler())
> > > MGC172.16.100.101 at o2ib: cfg command failed: rc = -22
> > > Jul 10 15:22:40 adm-sup1 kernel: Lustre:    cmd=cf00f 0:hydra-OST0010-osc
> > >  1:osc.active=0  
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError: 15b-f:
> > > MGC172.16.100.101 at o2ib: Configuration from log hydra-client failed from
> > > MGS -22. Check client and MGS are on compatible version.
> > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: hydra: root_squash is set to
> > > 99:99
> > > Jul 10 15:22:40 adm-sup1 systemd-udevd[26823]: Process '/usr/sbin/lctl
> > > set_param 'llite.hydra-ffff8be5340c2000.nosquash_nids=192.168.80.84 at tcp
> > > 192.168.80.122 at tcp 192.168.80.21 at tcp 172.16.90.11 at o2ib 172.16.100.211 at o2ib
> > > 172.16.100.212 at o2ib 172.16.100.213 at o2ib 172.16.100.214 at o2ib
> > > 172.16.100.215 at o2ib 172.16.90.51 at o2ib'' failed with exit code 2.
> > > Jul 10 15:22:40 adm-sup1 kernel: Lustre: Unmounted hydra-client
> > > Jul 10 15:22:40 adm-sup1 kernel: LustreError:
> > > 26803:0:(obd_mount.c:1680:lustre_fill_super()) Unable to mount  (-22)
> > > 
> > > 
> > > 
> > > 
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6357 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231117/78d0e87d/attachment.bin>


More information about the lustre-discuss mailing list