[Lustre-discuss] renamed directory retains a dentry under its old name?

Oleg Drokin Oleg.Drokin at Sun.COM
Thu Nov 19 18:52:20 PST 2009


Hello!

On Nov 19, 2009, at 7:06 AM, Phil Schwan wrote:

> Hello old friends!  I return with a gift, like an almost-forgotten
> uncle visiting from a faraway land.

Long time no see! ;)

> I have an interesting issue, on 1.6.6:
> # cat /proc/fs/lustre/version
> lustre: 1.6.6
> kernel: patchless
> build:  1.6.6-19700101080000-PRISTINE-.usr.src.kernels.2.6.18-128.1.16.el5-x86_64.-2.6.18-128.1.16.el5
> Consider this setup:
> - one task creates a "foo.working" directory, does all its work
> inside, then renames it to "foo.done"
> 
> - another task polls, waiting for "foo.working" to disappear.  all of
> this occurs on one node.
> - the problem: the rename occurred, but "foo.working" remains as a valid dentry!

Well, this might be bug 2969 I would think. But depends on how the second task is polling.
There were several things done to avert it in the past, I wonder if 1.8.2 would work better for you.

> Witness:
> 
> node1$ ls
> tape18            tapeLabel_19.txt  tid64.done     tid67.working  tid70.working
> tape19.working    tid62.done        tid65.done     tid68.working
> tapeLabel_18.txt  tid63.done        tid66.working  tid69
> 
> -- Note the absence of a "tid65.working" directory
> 
> node1$ stat tid65.working
>   File: `tid65.working'
>   Size: 12288           Blocks: 24         IO Block: 4096   directory
> Device: 6d48dd40h/1833491776d   Inode: 76317251    Links: 3
> Access: (2775/drwxrwsr-x)  Uid: ( 3005/ stuartm)   Gid: ( 2000/    prod)
> Access: 2009-11-18 17:14:26.000000000 +0800
> Modify: 2009-11-18 16:08:01.000000000 +0800
> Change: 2009-11-18 16:08:01.000000000 +0800
> This is unique to node1.  On node2:

Yes, I think this does match bug 2969 behavior.
We add entry to dcache without lock (not visible in the trace you provided). Then we do rename, then we do some sort of stat on a renamed
entry and reobtain the lock. Then we do stat on old name, and since lock is on inode - we find the newly reinstantiated lock and declare
old dentry as valid.

> We stopped the job when it became clear that it would never finish.
> Eventually that lock did disappear -- likely just due to normal DLM
> turnover -- and the problem resolved itself.  If the task had been
> allowed to continue, however, constantly stat()ing that dead
> directory, the lock would have remained at the bottom of the LRU --
> and thus it would be an effectively infinite loop!

I am sorry to hear we came back to haunt you after all this time.
I wonder if my patch for 20323 would have helped this case
(or have just always returning the lock), though on the other hand this
is inode from mkdir and so might have never go through open path.
Bug 16417 is what landed into 1.8.2 and is a complete rework of
dcache caching logic for dentries and has a better chance of fixing this,
I would say.
If not, it would be great if the lock will start earlier in time, definitely
before rename happens.

I hope this problem did not ruin your day in the end.

And we do miss you. Does your coming with such a question means you are on your way back to us? ;)

Bye,
    Oleg



More information about the lustre-discuss mailing list