[Lustre-discuss] renamed directory retains a dentry under its old name?

Phil Schwan phils at dugeo.com
Thu Nov 19 04:06:27 PST 2009


Hello old friends!  I return with a gift, like an almost-forgotten
uncle visiting from a faraway land.

I have an interesting issue, on 1.6.6:

# cat /proc/fs/lustre/version
lustre: 1.6.6
kernel: patchless
build:  1.6.6-19700101080000-PRISTINE-.usr.src.kernels.2.6.18-128.1.16.el5-x86_64.-2.6.18-128.1.16.el5


Consider this setup:

- one task creates a "foo.working" directory, does all its work
inside, then renames it to "foo.done"

- another task polls, waiting for "foo.working" to disappear.  all of
this occurs on one node.

- the problem: the rename occurred, but "foo.working" remains as a valid dentry!


Witness:

node1$ ls
tape18            tapeLabel_19.txt  tid64.done     tid67.working  tid70.working
tape19.working    tid62.done        tid65.done     tid68.working
tapeLabel_18.txt  tid63.done        tid66.working  tid69

-- Note the absence of a "tid65.working" directory

node1$ stat tid65.working
  File: `tid65.working'
  Size: 12288           Blocks: 24         IO Block: 4096   directory
Device: 6d48dd40h/1833491776d   Inode: 76317251    Links: 3
Access: (2775/drwxrwsr-x)  Uid: ( 3005/ stuartm)   Gid: ( 2000/    prod)
Access: 2009-11-18 17:14:26.000000000 +0800
Modify: 2009-11-18 16:08:01.000000000 +0800
Change: 2009-11-18 16:08:01.000000000 +0800

This is unique to node1.  On node2:

node2$ stat tid65.working
stat: cannot stat `tid65.working': No such file or directory


Attached is a lnet.debug=-1 log of the stat on node1, in which we can
see it revalidating the dentry for a directory that no longer exists.

A snapshot of that lock in the DLM cache reveals no obvious abnormal pathology:

00010000:00010000:7:
1258537200.873945:0:30794:0:(ldlm_resource.c:1116:ldlm_resource_dump())
--- Resource: ffff810133749500 (76317251/3438387721/0/0) (rc: 3)
00010000:00010000:7:1258537200.873947:0:30794:0:(ldlm_resource.c:1120:ldlm_resource_dump())
Granted locks:
00010000:00010000:7:1258537200.873948:0:30794:0:(ldlm_lock.c:1729:ldlm_lock_dump())
 -- Lock dump: ffff8101ac7d2c00/0x3b4ffa12a31eb406 (rc: 1) (pos: 1)
(pid: 28837)
00010000:00010000:7:1258537200.873950:0:30794:0:(ldlm_lock.c:1742:ldlm_lock_dump())
  Node: NID 172.16.0.251 at tcp (rhandle: 0x3ad2b5ae2b9e570a)
00010000:00010000:7:1258537200.873951:0:30794:0:(ldlm_lock.c:1746:ldlm_lock_dump())
  Resource: ffff810133749500 (76317251/3438387721)
00010000:00010000:7:1258537200.873953:0:30794:0:(ldlm_lock.c:1751:ldlm_lock_dump())
  Req mode: CR, grant mode: CR, rc: 1, read: 0, write: 0 flags: 0x0
00010000:00010000:7:1258537200.873954:0:30794:0:(ldlm_lock.c:1765:ldlm_lock_dump())
  Bits: 0x3


We stopped the job when it became clear that it would never finish.
Eventually that lock did disappear -- likely just due to normal DLM
turnover -- and the problem resolved itself.  If the task had been
allowed to continue, however, constantly stat()ing that dead
directory, the lock would have remained at the bottom of the LRU --
and thus it would be an effectively infinite loop!


In the interest of full disclosure, there were some of these in the
dmesg.  But they were from several days prior to the creation even of
the parent directory, so I think it's very unlikely that they are
related:

BUG: warning at fs/inotify.c:202/set_dentry_child_flags() (Tainted: G     )

Call Trace:
 [<ffffffff800f2777>] set_dentry_child_flags+0xef/0x14d
 [<ffffffff800f280d>] remove_watch_no_event+0x38/0x47
 [<ffffffff800f2834>] inotify_remove_watch_locked+0x18/0x3b
 [<ffffffff800f296f>] inotify_rm_wd+0x8d/0xb6
 [<ffffffff800f2ee5>] sys_inotify_rm_watch+0x46/0x63
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Has this cheerful missive induced an "A ha!" moment in anyone, that
would explain this?  Have I overlooked something important?

Much like those given by your own almost-forgotten uncles, this gift
was essentially a pair of itchy wool socks.  I hope you will forgive
me.

Cheers,

-p

(nobody handles the moderator queue any more, eh?)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-debug.log
Type: text/x-log
Size: 12507 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091119/fb242590/attachment.bin>


More information about the lustre-discuss mailing list