[Lustre-discuss] renamed directory retains a dentry under its old name?
Phil Schwan
phils at dugeo.com
Thu Nov 19 04:06:27 PST 2009
Hello old friends! I return with a gift, like an almost-forgotten
uncle visiting from a faraway land.
I have an interesting issue, on 1.6.6:
# cat /proc/fs/lustre/version
lustre: 1.6.6
kernel: patchless
build: 1.6.6-19700101080000-PRISTINE-.usr.src.kernels.2.6.18-128.1.16.el5-x86_64.-2.6.18-128.1.16.el5
Consider this setup:
- one task creates a "foo.working" directory, does all its work
inside, then renames it to "foo.done"
- another task polls, waiting for "foo.working" to disappear. all of
this occurs on one node.
- the problem: the rename occurred, but "foo.working" remains as a valid dentry!
Witness:
node1$ ls
tape18 tapeLabel_19.txt tid64.done tid67.working tid70.working
tape19.working tid62.done tid65.done tid68.working
tapeLabel_18.txt tid63.done tid66.working tid69
-- Note the absence of a "tid65.working" directory
node1$ stat tid65.working
File: `tid65.working'
Size: 12288 Blocks: 24 IO Block: 4096 directory
Device: 6d48dd40h/1833491776d Inode: 76317251 Links: 3
Access: (2775/drwxrwsr-x) Uid: ( 3005/ stuartm) Gid: ( 2000/ prod)
Access: 2009-11-18 17:14:26.000000000 +0800
Modify: 2009-11-18 16:08:01.000000000 +0800
Change: 2009-11-18 16:08:01.000000000 +0800
This is unique to node1. On node2:
node2$ stat tid65.working
stat: cannot stat `tid65.working': No such file or directory
Attached is a lnet.debug=-1 log of the stat on node1, in which we can
see it revalidating the dentry for a directory that no longer exists.
A snapshot of that lock in the DLM cache reveals no obvious abnormal pathology:
00010000:00010000:7:
1258537200.873945:0:30794:0:(ldlm_resource.c:1116:ldlm_resource_dump())
--- Resource: ffff810133749500 (76317251/3438387721/0/0) (rc: 3)
00010000:00010000:7:1258537200.873947:0:30794:0:(ldlm_resource.c:1120:ldlm_resource_dump())
Granted locks:
00010000:00010000:7:1258537200.873948:0:30794:0:(ldlm_lock.c:1729:ldlm_lock_dump())
-- Lock dump: ffff8101ac7d2c00/0x3b4ffa12a31eb406 (rc: 1) (pos: 1)
(pid: 28837)
00010000:00010000:7:1258537200.873950:0:30794:0:(ldlm_lock.c:1742:ldlm_lock_dump())
Node: NID 172.16.0.251 at tcp (rhandle: 0x3ad2b5ae2b9e570a)
00010000:00010000:7:1258537200.873951:0:30794:0:(ldlm_lock.c:1746:ldlm_lock_dump())
Resource: ffff810133749500 (76317251/3438387721)
00010000:00010000:7:1258537200.873953:0:30794:0:(ldlm_lock.c:1751:ldlm_lock_dump())
Req mode: CR, grant mode: CR, rc: 1, read: 0, write: 0 flags: 0x0
00010000:00010000:7:1258537200.873954:0:30794:0:(ldlm_lock.c:1765:ldlm_lock_dump())
Bits: 0x3
We stopped the job when it became clear that it would never finish.
Eventually that lock did disappear -- likely just due to normal DLM
turnover -- and the problem resolved itself. If the task had been
allowed to continue, however, constantly stat()ing that dead
directory, the lock would have remained at the bottom of the LRU --
and thus it would be an effectively infinite loop!
In the interest of full disclosure, there were some of these in the
dmesg. But they were from several days prior to the creation even of
the parent directory, so I think it's very unlikely that they are
related:
BUG: warning at fs/inotify.c:202/set_dentry_child_flags() (Tainted: G )
Call Trace:
[<ffffffff800f2777>] set_dentry_child_flags+0xef/0x14d
[<ffffffff800f280d>] remove_watch_no_event+0x38/0x47
[<ffffffff800f2834>] inotify_remove_watch_locked+0x18/0x3b
[<ffffffff800f296f>] inotify_rm_wd+0x8d/0xb6
[<ffffffff800f2ee5>] sys_inotify_rm_watch+0x46/0x63
[<ffffffff8005e28d>] tracesys+0xd5/0xe0
Has this cheerful missive induced an "A ha!" moment in anyone, that
would explain this? Have I overlooked something important?
Much like those given by your own almost-forgotten uncles, this gift
was essentially a pair of itchy wool socks. I hope you will forgive
me.
Cheers,
-p
(nobody handles the moderator queue any more, eh?)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-debug.log
Type: text/x-log
Size: 12507 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091119/fb242590/attachment.bin>
More information about the lustre-discuss
mailing list