[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

恩强周 eqzhou at gmail.com
Thu Dec 3 19:19:37 PST 2009


hi, all
I also hit ldiskfs problems.I have two osts report messages like this.
LDISKFS-fs: group 22879: 30128 blocks in bitmap, 29885 in gd
LDISKFS-fs: group 22810: 29150 blocks in bitmap, 29242 in gd
LDISKFS-fs: group 22846: 28278 blocks in bitmap, 28324 in gd
...
Does it mean LDISKFS will corrupted at some time later?

Also one ost  reported messages like "Remounting ... read-only", so some
files cann't be write at that time.We have run e2fsck to fix it. But it
reported again now.
We have found that ldiskfs seems unstable since 1.6.(1.4  better than 1.6)
We have worryed about problem like filessystem corruption.Anyone can give
some suggestion?


2009/12/4 Craig Prescott <prescott at hpc.ufl.edu>

> Craig Prescott wrote:
> > Andreas Dilger wrote:
> >> Hmm, the code shouldn't be checking the checksums if the uninit_bg
> >> feature is not enabled.  I believe this was fixed in ext4 already:
> >>
> >> in ldiskfs_group_desc_csum_verify() change it to be:
> >>
> >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
> >>                                    __u32 block_group,
> >>                                    struct ext4_group_desc *gdp)
> >> {
> >>         if ((sbi->s_es->s_feature_ro_compat &
> >>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) &&
> >>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi,
> >> block_group, gdp)))
> >>                 return 0;
> >>         return 1;
> >> }
> >
> > Ok, thanks.  I'll try that.
> >
> <snip>
> > Again, I really appreciate the help, and will let the list know how it
> > goes.
>
> Sadly, we didn't have any luck with this.  We had written off the OST in
> our minds anyway, so to get any of the data back would have been a
> windfall.
>
> Wouldn't mount as ldiskfs with the group descriptor checksum disabled:
>
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Block bitmap for group 10112 not in group (block
> 484237063)!
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!
>
> Disabling that check and trying to mount yielded this one:
>
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group (block
> 14342712)!
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!
>
> Disabling that check yielded this one:
>
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode table for group 10112 not in group (block
> 3538357782)!
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors corrupted!
>
> All these messages were seen repeatedly in our fsck attempts.  If we had
> been able to get past this group, several thousand more would have
> followed.
>
> Disabling the inode table present in group check:
>
> Dec  3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on dm-7
>
> At that point we tried to rewrite superblocks with mkfs.lustre and
> --mkfsoptions="-S", which panic'd the OSS.  At that point, we gave up.
>
> Though it didn't work out this time, we'll be in a better position to be
> successful if this happens ever again.
>
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091204/55aa4dd2/attachment.htm>


More information about the lustre-discuss mailing list