[Lustre-discuss] Looping in __d_lookup

Alex Lyashkov Alexey.Lyashkov at Sun.COM
Wed May 21 23:22:04 PDT 2008


On Wed, 2008-05-21 at 21:05 +0200, Jakob Goldbach wrote:
> > 
> > kernel is 2.6.23.17 with patchless lustre 1.6.4.3, 
> 
> I'm running 1.6.4.3 patchless as well against an 2.6.18 vanilla kernel.
> Or at least that is what I thought. OpenVz patch effectively makes the
> kernel a 2.6.18++ kernel because they add features from newer kernels in
> their maintained 2.6.18 based kernel.  
> 
> So the lockup in __d_lookup may just relate to newer patchless clients. 
> 
> I got a debug patch from the OpenVz community which indicate dcache
> chain corruption in a lustre code path. 
> 
> Patch snippet is
> 
> --- ./fs/dcache.c.ddebug2	2008-05-21 14:52:15.000000000 +0400
> +++ ./fs/dcache.c	2008-05-21 15:10:06.000000000 +0400
> @@ -1350,6 +1350,18 @@ static void __d_rehash(struct dentry * e
>  {
>  
>   	entry->d_flags &= ~DCACHE_UNHASHED;
> +	if (!spin_is_locked(&dcache_lock)) {
> +		printk(KERN_ERR "Dcache lock is not taken on add\n");
> +		dump_stack();
> +	} else if (list->first != NULL &&
> +			list->first->pprev != &list->first) {
> +		printk(KERN_ERR "Dcache chain corruption:\n");
> +		printk(KERN_ERR "Chain %p --next-> %p\n",
> +				list, list->first);
> +		printk(KERN_ERR "First %p <-pprev- %p\n",
> +				list->first, list->first->pprev);
> +		dump_stack();
> +	}
>   	hlist_add_head_rcu(&entry->d_hash, list);
>  }
> 
> and stack trace 
> 
> [ 6447.548789] Dcache chain corruption:
> [ 6447.549529] Chain ffff8100010de880 --next-> ffff8100b4ce00b0
> [ 6447.550699] First ffff8100b4ce00b0 <-pprev- 0000000000200200
> [ 6447.551711] 
> [ 6447.551713] Call Trace:
> [ 6447.552809]  [<ffffffff8020ae20>] show_trace+0xae/0x360
> [ 6447.553784]  [<ffffffff8020b0e7>] dump_stack+0x15/0x17
> [ 6447.554727]  [<ffffffff8029ee94>] __d_rehash+0x75/0x97
> [ 6447.555797]  [<ffffffff8029ef2a>] d_rehash+0x74/0x91
> [ 6447.556846]  [<ffffffff883b4c6a>] :lustre:ll_revalidate_it+0xa1a/0xd90
> [ 6447.557966]  [<ffffffff883b529c>] :lustre:ll_revalidate_nd+0x2bc/0x360
> [ 6447.559082]  [<ffffffff80295741>] do_lookup+0x15d/0x193
> [ 6447.560142]  [<ffffffff80296fd9>] __link_path_walk+0x409/0x10ac
> [snip]
> 

This patch and backtrace say - dcache chain was damaged _before_ enter
to lustre, lustre start add entry to new position in dentry cache, and
find damaged entry in list.


-- 
Alex Lyashkov <Alexey.lyashkov at sun.com>
Lustre Group, Sun Microsystems




More information about the lustre-discuss mailing list