[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Craig Prescott prescott at hpc.ufl.edu
Wed Dec 2 16:16:35 PST 2009


Andreas Dilger wrote:
> Hmm, the code shouldn't be checking the checksums if the uninit_bg
> feature is not enabled.  I believe this was fixed in ext4 already:
> 
> in ldiskfs_group_desc_csum_verify() change it to be:
> 
> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
>                                    __u32 block_group,
>                                    struct ext4_group_desc *gdp)
> {
>         if ((sbi->s_es->s_feature_ro_compat &
>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) &&
>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi, 
> block_group, gdp)))
>                 return 0;
>         return 1;
> }

Ok, thanks.  I'll try that.

Here's what the 1.8.1.1 ldiskfs_group_desc_csum_verify() looks like 
(from lustre-ldiskfs-3.0.9/ldiskfs/super.c):

int ldiskfs_group_desc_csum_verify(struct ldiskfs_sb_info *sbi, __u32 
block_group,
                                 struct ldiskfs_group_desc *gdp)
{
         return (gdp->bg_checksum ==
                         ldiskfs_group_desc_csum(sbi, block_group, gdp));
}

(this is following an 'rpmbuild -bc lustre-ldiskfs.spec' from 
lustre-ldiskfs-3.0.9-2.6.18_128.7.1.el5_lustre.1.8.1.1.src.rpm).

The problematic OST is direct-attached to a running OSS with ldiskfs.ko 
loaded (problematic OST is marked inactive).  I'll have to wait at least 
until tomorrow for an opportunity to try deploying and reloading an 
updated ldiskfs.ko.

Again, I really appreciate the help, and will let the list know how it goes.

Thanks,
Craig Prescott
UF HPC Center





More information about the lustre-discuss mailing list