[Lustre-discuss] fsck of OST problems - endless loop restarting pass 1

Andreas Dilger adilger at sun.com
Sat Dec 5 19:19:58 PST 2009


On 2009-12-03, at 20:19, 恩强周 wrote:
> hi, all
> I also hit ldiskfs problems.I have two osts report messages like this.
> LDISKFS-fs: group 22879: 30128 blocks in bitmap, 29885 in gd
> LDISKFS-fs: group 22810: 29150 blocks in bitmap, 29242 in gd
> LDISKFS-fs: group 22846: 28278 blocks in bitmap, 28324 in gd

I believe this is a bug that was already fixed in newer Lustre releases.
You should run the Lustre "e2fsck -f" on the device, when it is  
unmounted.

> Does it mean LDISKFS will corrupted at some time later?
>
> Also one ost  reported messages like "Remounting ... read-only", so  
> some files cann't be write at that time.We have run e2fsck to fix  
> it. But it reported again now.
> We have found that ldiskfs seems unstable since 1.6.(1.4  better  
> than 1.6)
> We have worryed about problem like filessystem corruption.Anyone can  
> give some suggestion?

You should update to a newer version of Lustre.

> 2009/12/4 Craig Prescott <prescott at hpc.ufl.edu>
> Craig Prescott wrote:
> > Andreas Dilger wrote:
> >> Hmm, the code shouldn't be checking the checksums if the uninit_bg
> >> feature is not enabled.  I believe this was fixed in ext4 already:
> >>
> >> in ldiskfs_group_desc_csum_verify() change it to be:
> >>
> >> int ldiskfs_group_desc_csum_verify(struct ext4_sb_info *sbi,
> >>                                    __u32 block_group,
> >>                                    struct ext4_group_desc *gdp)
> >> {
> >>         if ((sbi->s_es->s_feature_ro_compat &
> >>              cpu_to_le32(LDISKFS_FEATURE_RO_COMPAT_GDT_CSUM)) &&
> >>             (gdp->bg_checksum != ldiskfs_group_desc_csum(sbi,
> >> block_group, gdp)))
> >>                 return 0;
> >>         return 1;
> >> }
> >
> > Ok, thanks.  I'll try that.
> >
> <snip>
> > Again, I really appreciate the help, and will let the list know  
> how it
> > goes.
>
> Sadly, we didn't have any luck with this.  We had written off the  
> OST in
> our minds anyway, so to get any of the data back would have been a  
> windfall.
>
> Wouldn't mount as ldiskfs with the group descriptor checksum disabled:
>
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Block bitmap for group 10112 not in group  
> (block
> 484237063)!
> Dec  3 10:58:05 tebow2 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> Disabling that check and trying to mount yielded this one:
>
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode bitmap for group 10112 not in group  
> (block
> 14342712)!
> Dec  3 11:01:13 tebow2 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> Disabling that check yielded this one:
>
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs error (device dm-7):
> ldiskfs_check_descriptors: Inode table for group 10112 not in group  
> (block
> 3538357782)!
> Dec  3 11:01:59 tebow2 kernel: LDISKFS-fs: group descriptors  
> corrupted!
>
> All these messages were seen repeatedly in our fsck attempts.  If we  
> had
> been able to get past this group, several thousand more would have  
> followed.
>
> Disabling the inode table present in group check:
>
> Dec  3 11:02:35 tebow2 kernel: ldiskfs: No journal on filesystem on  
> dm-7
>
> At that point we tried to rewrite superblocks with mkfs.lustre and
> --mkfsoptions="-S", which panic'd the OSS.  At that point, we gave up.
>
> Though it didn't work out this time, we'll be in a better position  
> to be
> successful if this happens ever again.
>
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list