[Lustre-discuss] 1.6.4.1 - LBUG on MDS

Bernd Schubert bs at q-leap.de
Mon Jan 21 02:23:36 PST 2008


Hello Niklas,

On Monday 21 January 2008 08:09:35 Niklas Edmundsson wrote:
> On Mon, 14 Jan 2008, Johann Lombardi wrote:
> > On Mon, Jan 14, 2008 at 08:02:43AM +0100, Niklas Edmundsson wrote:
> >> Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS
> >> LBUG:ed with:
> >>
> >> -------------8<--------------------
> >> Jan 12 10:39:40 LustreError:
> >> 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink
> >> == 1) failed:dir nlink == 0 Jan 12 10:39:40 LustreError:
> >> 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Jan 12 10:39:40
> >> Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing
> >> stack for process 6198 Jan 12 10:39:41 LustreError: dumping log to
> >> /tmp/lustre-log.1200130781.6198
> >
> > The debian kernel maintainers have probably merged the ext3_link() patch
> > to return -ENOENT when inode->i_nlink is equal to 0. Please note that
> > this patch is included in the RHEL5 kernels (and our RHEL5 series handles
> > this), but not in the 2.6.18.8 vanilla kernel.
> > To fix this, you should add ext3-unlink-race.patch to the 2.6.18 ldiskfs
> > series.
>
> Hmm, ext3-unlink-race.patch didn't apply at all, and looking manually
> I see no obvious place to apply it to.
>
> Diffing the ext3-trees between kernel.org 2.6.18.8 and debian 2.6.18 I
> see no patch that obviously touches ext3_link/ENOENT/i_nlink:
>
> ---------------------8<----------------------------
> diff -rpu /scratch/linux-2.6.18.8/fs/ext3/dir.c ./dir.c
> --- /scratch/linux-2.6.18.8/fs/ext3/dir.c       2007-02-24
> 00:52:30.000000000 +0100 +++ ./dir.c     2007-12-22 03:24:00.000000000
> +0100
> @@ -151,6 +151,9 @@ static int ext3_readdir(struct file * fi
>                          ext3_error (sb, "ext3_readdir",
>                                  "directory #%lu contains a hole at offset
> %lu", inode->i_ino, (unsigned long)filp->f_pos); +                       /*
> corrupt size?  Maybe no more blocks to read */ +                       if
> (filp->f_pos > inode->i_blocks << 9)
> +                               break;
>                          filp->f_pos += sb->s_blocksize - offset;
>                          continue;
>                  }
> diff -rpu /scratch/linux-2.6.18.8/fs/ext3/namei.c ./namei.c
> --- /scratch/linux-2.6.18.8/fs/ext3/namei.c     2007-02-24
> 00:52:30.000000000 +0100 +++ ./namei.c   2007-12-22 03:24:00.000000000
> +0100
> @@ -551,6 +551,15 @@ static int htree_dirblock_to_tree(struct
>                                             dir->i_sb->s_blocksize -
>                                             EXT3_DIR_REC_LEN(0));
>          for (; de < top; de = ext3_next_entry(de)) {
> +               if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir,
> de, bh, +                                      
> (block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb)) +                                 
>              +((char *)de - bh->b_data))) { +                       /* On
> error, skip the f_pos to the next block. */ +                      
> dir_file->f_pos = (dir_file->f_pos |
> +                                       (dir->i_sb->s_blocksize - 1)) + 1;
> +                       brelse (bh);
> +                       return count;
> +               }
>                  ext3fs_dirhash(de->name, de->name_len, hinfo);
>                  if ((hinfo->hash < start_hash) ||
>                      ((hinfo->hash == start_hash) &&
> ---------------------8<----------------------------
>
> So I think that this bug is most likely present when using vanilla
> kernel.org 2.6.18.8 too...
>
> Thoughts/suggestions?
>
> My gut feeling is that the MDS code is relying on some corner case
> behaviour of ext3, and that this behaviour is changing with newer
> kernels...

Could you try this patch, this is what we are using and what should be in 
debians lustre svn

diff -r a1bf8dcdfe1f lustre/mds/mds_reint.c
--- a/lustre/mds/mds_reint.c	Mon Jul 09 17:00:16 2007 +0200
+++ b/lustre/mds/mds_reint.c	Mon Jul 09 17:01:04 2007 +0200
@@ -1481,7 +1481,12 @@ static int mds_orphan_add_link(struct md
          * for linking and return real mode back then -bzzz */
         mode = inode->i_mode;
         inode->i_mode = S_IFREG;
+
+        /* 2.6.21 will refuse to add a link of inode->i_nlink == 0 */
+        inode->i_nlink = 1;
         rc = vfs_link(dentry, pending_dir, pending_child);
+        inode->i_nlink--;
+        mark_inode_dirty(inode);
         if (rc)
                 CERROR("error linking orphan %s to PENDING: rc = %d\n",
                        rec->ur_name, rc);


I didn't like the ext3-unlink-race.patch, it removes sanity checks someone 
certainly added for good reasons and therefore I introduced this patch.


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH



More information about the lustre-discuss mailing list