[Lustre-discuss] 1.6.4.1 - LBUG on MDS
Bernd Schubert
bs at q-leap.de
Mon Jan 21 02:23:36 PST 2008
Hello Niklas,
On Monday 21 January 2008 08:09:35 Niklas Edmundsson wrote:
> On Mon, 14 Jan 2008, Johann Lombardi wrote:
> > On Mon, Jan 14, 2008 at 08:02:43AM +0100, Niklas Edmundsson wrote:
> >> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS
> >> LBUG:ed with:
> >>
> >> -------------8<--------------------
> >> Jan 12 10:39:40 LustreError:
> >> 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink
> >> == 1) failed:dir nlink == 0 Jan 12 10:39:40 LustreError:
> >> 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Jan 12 10:39:40
> >> Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing
> >> stack for process 6198 Jan 12 10:39:41 LustreError: dumping log to
> >> /tmp/lustre-log.1200130781.6198
> >
> > The debian kernel maintainers have probably merged the ext3_link() patch
> > to return -ENOENT when inode->i_nlink is equal to 0. Please note that
> > this patch is included in the RHEL5 kernels (and our RHEL5 series handles
> > this), but not in the 2.6.18.8 vanilla kernel.
> > To fix this, you should add ext3-unlink-race.patch to the 2.6.18 ldiskfs
> > series.
>
> Hmm, ext3-unlink-race.patch didn't apply at all, and looking manually
> I see no obvious place to apply it to.
>
> Diffing the ext3-trees between kernel.org 2.6.18.8 and debian 2.6.18 I
> see no patch that obviously touches ext3_link/ENOENT/i_nlink:
>
> ---------------------8<----------------------------
> diff -rpu /scratch/linux-2.6.18.8/fs/ext3/dir.c ./dir.c
> --- /scratch/linux-2.6.18.8/fs/ext3/dir.c 2007-02-24
> 00:52:30.000000000 +0100 +++ ./dir.c 2007-12-22 03:24:00.000000000
> +0100
> @@ -151,6 +151,9 @@ static int ext3_readdir(struct file * fi
> ext3_error (sb, "ext3_readdir",
> "directory #%lu contains a hole at offset
> %lu", inode->i_ino, (unsigned long)filp->f_pos); + /*
> corrupt size? Maybe no more blocks to read */ + if
> (filp->f_pos > inode->i_blocks << 9)
> + break;
> filp->f_pos += sb->s_blocksize - offset;
> continue;
> }
> diff -rpu /scratch/linux-2.6.18.8/fs/ext3/namei.c ./namei.c
> --- /scratch/linux-2.6.18.8/fs/ext3/namei.c 2007-02-24
> 00:52:30.000000000 +0100 +++ ./namei.c 2007-12-22 03:24:00.000000000
> +0100
> @@ -551,6 +551,15 @@ static int htree_dirblock_to_tree(struct
> dir->i_sb->s_blocksize -
> EXT3_DIR_REC_LEN(0));
> for (; de < top; de = ext3_next_entry(de)) {
> + if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir,
> de, bh, +
> (block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb)) +
> +((char *)de - bh->b_data))) { + /* On
> error, skip the f_pos to the next block. */ +
> dir_file->f_pos = (dir_file->f_pos |
> + (dir->i_sb->s_blocksize - 1)) + 1;
> + brelse (bh);
> + return count;
> + }
> ext3fs_dirhash(de->name, de->name_len, hinfo);
> if ((hinfo->hash < start_hash) ||
> ((hinfo->hash == start_hash) &&
> ---------------------8<----------------------------
>
> So I think that this bug is most likely present when using vanilla
> kernel.org 2.6.18.8 too...
>
> Thoughts/suggestions?
>
> My gut feeling is that the MDS code is relying on some corner case
> behaviour of ext3, and that this behaviour is changing with newer
> kernels...
Could you try this patch, this is what we are using and what should be in
debians lustre svn
diff -r a1bf8dcdfe1f lustre/mds/mds_reint.c
--- a/lustre/mds/mds_reint.c Mon Jul 09 17:00:16 2007 +0200
+++ b/lustre/mds/mds_reint.c Mon Jul 09 17:01:04 2007 +0200
@@ -1481,7 +1481,12 @@ static int mds_orphan_add_link(struct md
* for linking and return real mode back then -bzzz */
mode = inode->i_mode;
inode->i_mode = S_IFREG;
+
+ /* 2.6.21 will refuse to add a link of inode->i_nlink == 0 */
+ inode->i_nlink = 1;
rc = vfs_link(dentry, pending_dir, pending_child);
+ inode->i_nlink--;
+ mark_inode_dirty(inode);
if (rc)
CERROR("error linking orphan %s to PENDING: rc = %d\n",
rec->ur_name, rc);
I didn't like the ext3-unlink-race.patch, it removes sanity checks someone
certainly added for good reasons and therefore I introduced this patch.
Cheers,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
More information about the lustre-discuss
mailing list