[lustre-discuss] no more free slots in catalog

Tue Dec 11 06:32:52 PST 2018

Le 11/12/2018 14:13, quentin.bouget at cea.fr a écrit :
> Le 11/12/2018 à 10:28, Julien Rey a écrit :
>> Le 10/12/2018 13:33, quentin.bouget at cea.fr a écrit :
>>> Le 10/12/2018 à 12:00, Julien Rey a écrit :
>>>> Hello,
>>>>
>>>> We are running lustre 
>>>> 2.8.0-RC5--PRISTINE-2.6.32-573.12.1.el6_lustre.x86_64.
>>>>
>>>> Since thursday we are getting a "bad address" error when trying to 
>>>> write on the lustre volume.
>>>>
>>>> Looking at the logs on the MDS, we are getting this kind of messages :
>>>>
>>>> Dec 10 06:26:18 localhost kernel: Lustre: 
>>>> 9593:0:(llog_cat.c:93:llog_cat_new_log()) lustre-MDD0000: there are 
>>>> no more free slots in catalog
>>>> Dec 10 06:26:18 localhost kernel: Lustre: 
>>>> 9593:0:(llog_cat.c:93:llog_cat_new_log()) Skipped 45157 previous 
>>>> similar messages
>>>> Dec 10 06:26:18 localhost kernel: LustreError: 
>>>> 9593:0:(mdd_dir.c:887:mdd_changelog_ns_store()) lustre-MDD0000: 
>>>> cannot store changelog record: type = 6, name = 
>>>> 'PEPFOLD-00016_bestene1-mc-SC-min-grompp.log', t = 
>>>> [0x20000a58f:0x858e:0x0], p = [0x20000a57d:0x17fd9:0x0]: rc = -28
>>>> Dec 10 06:26:18 localhost kernel: LustreError: 
>>>> 9593:0:(mdd_dir.c:887:mdd_changelog_ns_store()) Skipped 45157 
>>>> previous similar messages
>>>>
>>>>
>>>> I saw here that this issue was supposed to be solved in 2.8.0:
>>>> https://jira.whamcloud.com/browse/LU-6556
>>>>
>>>> Could someone help us unlocking this situation ?
>>>>
>>>> Thanks.
>>>>
>>> Hello,
>>>
>>> The log messages don't point at a "bad address" issue but rather at 
>>> a "no space left on device" one ("rc = -28" --> -ENOSPC).
>>>
>>> You most likely have, at some point, registered a changelog user on 
>>> your mds and that user is not consuming changelogs.
>>>
>>> You can check this by running:
>>>
>>> [mds0]# lctl get_param mdd.*.changelog_users
>>> mdd.lustre-MDT0000.changelog_users=
>>> current index: 3
>>> ID    index
>>> cl1   0
>>>
>>> The most important thing to look for is the distance between 
>>> "current index" and the index for "cl1", "cl2", ...
>>> I expect for at least one changelog user, that distance is 2^32 (the 
>>> maximum number of changelog records).
>>> Note that changelog indexes wrap around (0, 1, 2, ..., 4294967295, 
>>> 0, 1, ...).
>>>
>>> If I am right, then you can either deregister the changelog user:
>>>
>>> [mds0]# lctl --device lustre-MDT0000 changelog_deregister cl1
>>>
>>> or acknowledge the records:
>>>
>>> [client]# lfs changelog_clear lustre-MDT0000 cl1 0
>>>
>>> (clearing with index 0 is a shortcut for "acknowledge every 
>>> changelog records")
>>>
>>> Both those options may take a while.
>>>
>>> There is a third one that might yield faster result, but it is also 
>>> much more dangerous to use (you might want to check with your 
>>> support first) :
>>>
>>> [mds0]# umount /dev/mdt0
>>> [mds0]# mount -t ldiskfs /dev/mdt0 /mnt/lustre-mdt0
>>> [mds0]# rm /mnt/lustre-mdt0/changelog_catalog
>>> [mds0]# rm /mnt/lustre-mdt0/changelog_users
>>> [mds0]# umount /dev/mdt0
>>> [mds0]# mount -t lustre /dev/mdt0 <...> # remount the mdt where it was
>>>
>>> *I cannot garantee this will not trash your filesystem. Use at your 
>>> own risk.
>>> *
>>>
>>> ---
>>>
>>> In recent versions (2.12, maybe even 2.10), lustre comes with a 
>>> builtin garbage collector for slow/inactive changelog users.
>>>
>>> Regards,
>>> Quentin Bouget
>>>
>>
>> Hello Quentin,
>>
>> Many thanks for your quick reply.
>>
>> This is what I got when I issued the command you suggested:
>>
>> [root at lustre-mds]# lctl get_param mdd.*.changelog_users
>> mdd.lustre-MDT0000.changelog_users=
>> current index: 4160462682
>> ID    index
>> cl1   21020582
>>
>> I then issued the following command:
>> [root at lustre-mds]# lctl --device lustre-MDT0000 changelog_deregister cl1
>>
>> It's been running for almost 20 hours now. Do you have an estimation 
>> of the time it could take ?
> When you deregister a changelog user: every changelog record has to be 
> invalidated (maybe this is batched, but I don't know enough about the 
> on-disk structure to say).
>
> I do not recall ever waiting that long. Then again, I never personally 
> deregistered a changelog users with that many pending changelog records.

The changelog_deregister command still hasn't finished yet. Is there any 
way to track the state of the purge of records ?

>
> If you just want to make sure Lustre is doing something, you can have 
> a look at your mdt0: invalidating changelog records should generate a 
> high load of small random writes.
> If the device is idle, something is probably wrong.

Hard to tell. iostat doesn't show much I/O.

>
> Is your filesystem still unavailable?

The following command doesn't show any registered changelog user:

cat /proc/fs/lustre/mdd/lustre-MDT0000/changelog_users

I tried to mount the lustre volume on a client. I don't get the "Bad 
Address" error anymore.

Best,

>
>>
>> Best,
>> -- 
>> Julien REY
>>
>> Plate-forme RPBS
>> Molécules Thérapeutiques In Silico (MTi)
>> Université Paris Diderot - Paris VII
>> tel : 01 57 27 83 95
>>
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> Quentin

-- 
Julien REY

Plate-forme RPBS
Molécules Thérapeutiques In Silico (MTi)
Université Paris Diderot - Paris VII
tel : 01 57 27 83 95

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181211/c8628abf/attachment-0001.html>