[lustre-discuss] RE No free catalog slots for log ( Lustre 2.5.3 & Robinhood 2.5.3 )

Alexander Boyko alexander.boyko at seagate.com
Fri Dec 4 05:36:33 PST 2015

> Here are 4 questions which we cannot find answers in LU-1586:
> 1.       According to Andres?s reply, there should some unconsumed
> changelog files on our MDT, and these files have taken all the space (file
> quotas?) Lustre gives to changelog. With Lustre 2.1, these files are under
> OBJECTS directory and can be listed in ldiskfs mode. In our case, with
> Lustre 2.5.3, there is no OBJECTS directory can be found. In this case, how
> can we monitor the situation before the unconsumed changelogs takes up all
> the disk space?
The changelog base on one catalog file and a plain llog files. Catalog
stores limited number of records about 64768. A catalog record size is 64
byte. Each record has information about plain llog file. A plain llog file
stores records about IO operation. A number of records at the plain llog
file is about 64768 with different record size. So changelog could store
64768^2 IO operations and it occupy filesystem space. The error "no free
catalog slots" is happened when changelog catalog doesn`t have a slot to
store a record about new plain lllog. All slots are filled or internal
changelog markers became crazy and internal logic don`t work.
To be closer to the root cause, you need to dump a changelog catalog and
check bitmap. Is there free slots? Something like

debugfs -R "dump changelog_catalog changelog_catalog.dmp" /dev/md55 &&
used=`llog_reader changelog_catalog.dmp | grep "type=1064553b" | wc -l`

2.       Why there are so many unconsumed changelogs? Could it related to
> our frequent remount of MDT( abort_recovery mode )?
umount operation create half empty plain llog file. And changelog_clear
can`t remove it, if all slots is freed. Only new mount can remove that
file. It could be related or not.

> 3.   When we remount the MDT, robinhood is still running. Why robinhood
> can not consume those old changelogs after MDT service is recovered?
> 4.   Why there is a huge difference between current index(4199610352 ) and
> cl1(49035933) index?
> Thank you for your time and help !
> Wang,Lu

Alexander Boyko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20151204/2666692a/attachment.htm>

More information about the lustre-discuss mailing list