[Lustre-discuss] OST error

Colin Faber cfaber at gmail.com
Thu Dec 2 13:05:53 PST 2010


Hi Bob,

If you're seeing the same errors on the same disk after e2fsck run, and 
it's not catching them, it's possible that you're hitting an edge case 
which isn't handled within e2fsck properly, however if you're 
experiencing different errors and e2fsck did catch them before, chances 
are you're looking at some hardware failure some place.

If this is a single disk, and you have SMART monitoring enabled, check 
your error counters, if it's a raid device, verify the error counters on 
that.

-cf


On 12/02/2010 02:00 PM, Bob Ball wrote:
> We were getting errors thrown by an OST.  /var/log/messages contained a
> lot of these:
> 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927]
> LDISKFS-fs error (device sdk): ldiskfs_mb_check_ondisk_bitmap: on-disk
> bitmap for group 639corrupted: 440 blocks free in bitmap, 439 - in gd
>
> So, I turned off (most) access to the disk via lctl (we have a LOT of
> client machines, some were missed) and got problems.  Had to use the
> alternate superblock to e2fsck the disk.  When back online, I still saw
> similar messages.  Updated to e2fsprogs 1.41.12 as suggested elsewhere.
> Repeated e2fsck.
>
> Still seeing these.  Users report some files corrupted, coming up with
> bad md5sum....  Any other thoughts on what to do about this problem?
>
> [2440763.879143] LDISKFS-fs error (device sdk):
> ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 35406corrupted:
> 1318 blocks free in bitmap, 1317 - in gd
> [2440763.879796]
> [2440763.882724] LustreError:
> 1651027:0:(fsfilt-ldiskfs.c:1333:fsfilt_ldiskfs_write_record()) can't
> read/create block: -28
> [2440763.882736] LustreError:
> 1651027:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log
> record: rc -28
> [2440763.882789] LustreError:
> 1651002:0:(mgc_request.c:1089:mgc_copy_llog()) Failed to copy remote log
> umt3-OST0019 (-28)
>
> Rebooted to make system clean as a whole, and found the same kind of
> thing repeating.
> [  285.834864] LDISKFS-fs (sdk): warning: mounting fs with errors,
> running e2fsck is recommended
> [  285.852559] LDISKFS-fs (sdk): mounted filesystem with ordered data mode
> [  286.079065] LDISKFS-fs (sdk): warning: mounting fs with errors,
> running e2fsck is recommended
> [  286.096316] LDISKFS-fs (sdk): mounted filesystem with ordered data mode
> [  286.940872] LDISKFS-fs error (device sdk):
> ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 35406corrupted:
> 1318 blocks free in bitmap, 1317 - in gd
> [  286.941693]
> [  286.945224] LustreError:
> 5790:0:(fsfilt-ldiskfs.c:1333:fsfilt_ldiskfs_write_record()) can't
> read/create block: -28
> [  286.945233] LustreError:
> 5790:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log
> record: rc -28
> [  286.945448] LustreError: 5763:0:(mgc_request.c:1089:mgc_copy_llog())
> Failed to copy remote log umt3-OST0019 (-28)
>
> All help appreciated.
>
> bob
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list