[lustre-discuss] RE No free catalog slots for log ( Lustre 2.5.3 & Robinhood 2.5.3 )

Alexander Boyko alexander.boyko at seagate.com
Mon Dec 7 05:18:41 PST 2015


>
> By the way, are the llog files you mentioned virtual or real? if they are
> real, where are they located? Need I clean them manually ?

They are real, the location is O/1/...
 lustre/utils/llog_reader ./changelog_catalog.dmp
rec #1 type=1064553b len=64
Header size : 8192
Time : Mon Dec  7 15:44:37 2015
Number of records: 1
Target uuid :
-----------------------
#01 (064)ogen=0 name=0x8:1
...
I`ve dump and check file, location base at name from record.
debugfs:  dump O/1/d8/8 plain.llog
lustre/utils/llog_reader ./plain.llog
rec #1 type=10660000 len=96 offset 8192
Header size : 8192
Time : Mon Dec  7 15:46:40 2015
Number of records: 1
Target uuid :
-----------------------
#01 (096)changelog record id:0x0 cr_flags:0x1000 cr_type:CREAT(0x1)
Looks like O/1/  for llog files only.


On Mon, Dec 7, 2015 at 4:55 AM, wanglu <wanglu at ihep.ac.cn> wrote:

> Hi Alexander,
>
> Before I recieved this reply, I deregistered the cl1 user. It took a very
> long time, and I am not sure if it successfully finished or not since the
> server crashed once the next morning.
> Then, I  moved the old changelog_catalog file, and created  a zero
> changelog_user file instead.
> This is what I got from the old changelog_catalog file.
> # ls -l /tmp/changelog.dmp
>     -rw-r--r-- 1 root root 4153280 Dec  6 06:54 /tmp/changelog.dmp
>     # llog_reader changelog.dmp |grep "type=1064553b" |wc -l
>     63432
> This number is smaller than 64768, I am not sure if it is related to the
> unfinished deregisteration or not.
>
> The first record number is 1, the last record number of is 64767. I think
> there maybe some skipped record numbers:
>     # llog_reader changelog.dmp |grep "type=1064553b" |head -n 1
>     rec #1 type=1064553b len=64
>     # llog_reader changelog.dmp |grep "type=1064553b" |tail -n 1
>     rec #64767 type=1064553b len=64
>     # llog_reader changelog.dmp |grep "^rec" | grep -v "type=1064553b"
> return 0 lines.
>
> By the way, are the llog files you mentioned virtual or real? if they are
> real, where are they located? Need I clean them manually ?
>
> Thanks,
> Lu,Wang
> *From:* Alexander Boyko <alexander.boyko at seagate.com>
> *Date:* 2015-12-04 21:36
> *To:* wanglu <wanglu at ihep.ac.cn>; lustre-discuss
> <lustre-discuss at lists.lustre.org>
> *Subject:* RE [lustre-discuss] No free catalog slots for log ( Lustre
> 2.5.3 & Robinhood 2.5.3 )
>
>> Here are 4 questions which we cannot find answers in LU-1586:
>>
>> 1.       According to Andres?s reply, there should some unconsumed
>> changelog files on our MDT, and these files have taken all the space (file
>> quotas?) Lustre gives to changelog. With Lustre 2.1, these files are under
>> OBJECTS directory and can be listed in ldiskfs mode. In our case, with
>> Lustre 2.5.3, there is no OBJECTS directory can be found. In this case, how
>> can we monitor the situation before the unconsumed changelogs takes up all
>> the disk space?
>>
> The changelog base on one catalog file and a plain llog files. Catalog
> stores limited number of records about 64768. A catalog record size is 64
> byte. Each record has information about plain llog file. A plain llog file
> stores records about IO operation. A number of records at the plain llog
> file is about 64768 with different record size. So changelog could store
> 64768^2 IO operations and it occupy filesystem space. The error "no free
> catalog slots" is happened when changelog catalog doesn`t have a slot to
> store a record about new plain lllog. All slots are filled or internal
> changelog markers became crazy and internal logic don`t work.
> To be closer to the root cause, you need to dump a changelog catalog and
> check bitmap. Is there free slots? Something like
>
> debugfs -R "dump changelog_catalog changelog_catalog.dmp" /dev/md55 &&
> used=`llog_reader changelog_catalog.dmp | grep "type=1064553b" | wc -l`
>
> 2.       Why there are so many unconsumed changelogs? Could it related to
>> our frequent remount of MDT( abort_recovery mode )?
>>
> umount operation create half empty plain llog file. And changelog_clear
> can`t remove it, if all slots is freed. Only new mount can remove that
> file. It could be related or not.
>
>
>
>> 3.   When we remount the MDT, robinhood is still running. Why robinhood
>> can not consume those old changelogs after MDT service is recovered?
>> 4.   Why there is a huge difference between current index(4199610352 )
>> and cl1(49035933) index?
>>
>> Thank you for your time and help !
>>
>> Wang,Lu
>>
>
> --
> Alexander Boyko
> Seagate
> www.seagate.com
>



-- 
Alexander Boyko
Seagate
www.seagate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20151207/2a47e2b1/attachment-0001.htm>


More information about the lustre-discuss mailing list