[lustre-discuss] MDT refuses to mount: "no more free slots in catalog" "can't initialize llog"
Jesse Stroik
jesse.stroik at ssec.wisc.edu
Mon Jun 16 05:40:52 PDT 2025
Hi Lustre users,
When reviewing the configuration logs prior to performing this work, I noticed that one of the OSTs in use is not listed in the configuration log for the MDT. The logs go from OST001e to OST0020 skipping OST001f with no mention of OST001f in the log.
The OST configuration log looks normal and was roughly as full as any of the other OSTs so it was getting data stored on it.
This now raises a concern for me: is it likely that one of the OST will have data we cannot recover if i rewrite these logs? At this point the file system cannot mount so I believe rewriting the logs is necessary in any case.
Jesse
________________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Jesse Stroik via lustre-discuss <lustre-discuss at lists.lustre.org>
Sent: Tuesday, June 10, 2025 12:43 PM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] MDT refuses to mount: "no more free slots in catalog" "can't initialize llog"
Hi Lustre users,
Recently we had a breaker fail overnight which affected part of one of our data centers, including an older lustre setup. The lustre setup was in use prior to the power failure.
This is lustre 2.15.1 / zfs 2.1.2 running on Rocky 8 and using zfs for the all backend file systems.
When i attempt to start lustre on the MDS, it mounts the MGS and then starts mounting the MDT and goes into recovery before failing with the following messages:
=======
Lustre: 5089:0:(llog_cat.c:101:llog_cat_new_log()) arc15-OST0001-osc-MDT0000: there are no more free slots in catalog [0x5:0x1:0x0]:0
LustreError: 5089:0:(osp_sync.c:1553:osp_sync_init()) arc15-OST0001-osc-MDT0000: can't initialize llog: rc = -28
LustreError: 5089:0:(obd_config.c:774:class_setup()) setup arc15-OST0001-osc-MDT0000 failed (-28)
LustreError: 6519:0:(obd_config.c:2001:class_config_llog_handler()) MGC172.16.23.25 at o2ib: cfg command failed: rc = -28
Lustre: cmd=cf003 0:arc15-OST0001-osc-MDT0000 1:arc15-OST0001_UUID 2:172.16.23.18 at o2ib
=======
If i attempt to start lustre again or mount the MDT directly after this first attempt, i also see this message in the logs:
=======
LustreError: 5350:0:(genops.c:522:class_register_device()) arc15-OST0001-osc-MDT0000: already exists, won't add
=======
LNET communication looks good, and this error happens whether or not the OSS units are powered up and have their OSTs mounted. The system didn't have a changelog user registered and wasn't consuming any changelog space.
If I mount just the MGS i can still see the configuration logs with: "lctl --device MGS llog_print <device>" for all devices.
My planned next steps are to regenerate the lustre configuration logs following the instructions here:
https://urldefense.com/v3/__https://doc.lustre.org/lustre_manual.xhtml*lustremaint.regenerateConfigLogs__;Iw!!Mak6IKo!NoJ1SobOf8MgZwfTk1FBugOyXCs5fBwgdxGrxIBkpPxU4Ww53FTn9YgzpZr9BQqZlUzWdTUuFNHx_b9hAV5_NXXOGWeRF039ZSFD2A$
I do have snapshots of the MGS and MDT stored on a zpool on another server.
Before i move on with that step, is there anything i should check or am missing?
Thanks,
Jesse
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!Mak6IKo!NoJ1SobOf8MgZwfTk1FBugOyXCs5fBwgdxGrxIBkpPxU4Ww53FTn9YgzpZr9BQqZlUzWdTUuFNHx_b9hAV5_NXXOGWeRF02TAyOiGQ$
More information about the lustre-discuss
mailing list