[lustre-discuss] Question about lctl changelog_deregister

Henri Doreau henri.doreau at cea.fr
Mon Jan 29 02:15:43 PST 2018


On 27/janv. - 00:07 Mohr Jr, Richard Frank (Rick Mohr) wrote:
> I have started playing around with Lustre changelogs, and I have noticed a behavior with the “lctl changelog_deregister” command that I don’t understand.  I tried running a little test by enabling changelogs on my MDS server:
> 
> [root at server ~]# lctl --device orhydra-MDT0000 changelog_register
> orhydra-MDT0000: Registered changelog userid ‘cl5'
> 
> On a client system, I ran “lfs changelog” to look at some of the entries.  In particular, I was looking at the last entry index number to see how quickly entries were being added:
> 
> [root at client ~]# lfs changelog orhydra-MDT0000 | tail -1
> 432339 11CLOSE 22:18:01.882533366 2018.01.26 0x3 t=[0x20000b4a7:0xe8de:0x0]
> 
> …waited a bit...
> 
> [root at client ~]# lfs changelog orhydra-MDT0000 | tail -1
> 694305 08RENME 22:24:39.345273014 2018.01.26 0x1 t=[0x20000b4d2:0x16f3d:0x0] p=[0x20000b528:0xe56a:0x0] restart.xml s=[0x20000b4d2:0x16f45:0x0] sp=[0x20000b528:0xe56a:0x0] restart.xml.tmp
> 
> At this point, I didn’t have any changelog consumer in place.  I was just simply looking at the entries.  When I was done, I deregistered the changelog userid:
> 
> [root at server ~]# lctl --device orhydra-MDT0000 changelog_deregister cl5
> orhydra-MDT0000: Deregistered changelog user 'cl5'
> 
> Based on what I read in the manual, this should remove all the changelog entries.  However, when I checked, I found that there were still several thousand entries:
> 
> [root at client ~]# lfs changelog orhydra-MDT0000 | head -1
> 705647 11CLOSE 22:24:57.281171001 2018.01.26 0x3 t=[0x20000ac25:0x1e308:0x0]
> [root at client ~]# lfs changelog orhydra-MDT0000 | tail -1
> 742926 11CLOSE 22:26:04.108790913 2018.01.26 0x3 t=[0x20000ac25:0x1e313:0x0]
> 
> I tried to clear them using the lfs changelog_clear command, but of course that failed because the userid was no longer registered.  I also waited a while in case it just took some time to clear the entries, but after several hours, they were still there.
> 
> Am I misunderstanding what is supposed to happen when a userid is deregistered?  Or did I mess up a command somewhere?  Or is this a bug?
> 
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 

Hello,

the most likely explanation is that you have other readers registered.
Notice how the registration command returns "cl5". If any of the
previous cl{1..4} have not been deregistered, then the behavior you see
is normal.

You can list registered changelog users by looking at
/proc/fs/lustre/mdd/orhydra-MDT0000/changelog_users

You are correctly expecting the records to be cleared on
"deregistration" of the last reader.  What happens when you deregister a
user is that lustre considers that this user has acknowledged
everything, and the garbage collection mechanism drops records that all
changelog readers have ACKed.  Therefore, if at least one reader is
lagging behind, not acknowledging anything, records will accumulate in
the changelog. If such a situation lasts for too long you will run into
trouble because the changelog can only store a limited number of
records.

I just tried and 2.10.2 seems to work as expected.

HTH

-- 
Henri


More information about the lustre-discuss mailing list