[Lustre-discuss] e2fsck bugs

Samuel Aparicio saparicio at bccrc.ca
Thu Aug 16 21:59:44 PDT 2012


we have come across two instances of what may be e2fsck bugs.

the situation comes from trying to repair some OSTs that suffered outages.
the system running is lustre-2.1.2 (latest maintenance release)
e2fstools is 1.42.3-wc3


case1:

e2fsck reports the following on an OST

e2fsck 1.42.3.wc3 (15-Aug-2012)
lustre-OST0001: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry '5753172' in /O/0/d20 (31064071) has deleted/unused inode 6178421.  Clear? yes

Entry '5753173' in /O/0/d21 (31064072) has deleted/unused inode 6178422.  Clear? yes

Entry '5753175' in /O/0/d23 (31096834) has deleted/unused inode 6178424.  Clear? yes

Entry '5753174' in /O/0/d22 (31096833) has deleted/unused inode 6178423.  Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

lustre-OST0001: ***** FILE SYSTEM WAS MODIFIED *****
lustre-OST0001: 3799906/32431488 files (1.2% non-contiguous), 2863449917/8302436879 blocks

when the disk is remounted to the OSS however after a short interval the following appears

Lustre: lustre-OST0001: sending delayed replies to recovered clients
Lustre: lustre-OST0001: received MDS connection from 10.9.89.51 at tcp
Lustre: Skipped 1 previous similar message
LDISKFS-fs error (device etherd!e9.0): ldiskfs_lookup: deleted inode referenced: 6178422
Aborting journal on device etherd!e9.24p2.
LDISKFS-fs (etherd!e9.0): Remounting filesystem read-only
LustreError: 14555:0:(filter.c:1506:filter_fid2dentry()) lustre-OST0001: object 5753173:0 lookup error: rc -5
LustreError: 14555:0:(filter.c:3129:__filter_oa2dentry()) filter_setattr error looking up object: 5753173:0
LustreError: 14551:0:(llog_cat.c:485:llog_cat_process_thread()) llog_cat_process() failed -5

it seems the dangling entry has not been fixed. it would appear we have no way to fix this disk in it's current state.
e2fsck will not rectify the issue. Is this a bug or a feature of a terminally damaged disk.??

case2:

e2fsck of a disk that was cleanly unmounted but came back up with errors reports some inodes with multiply claimed blocks.
however e2fsck reports the following when trying to delete them:

File /O/0/d1/4921697 (inode #14123014, mod time Wed Aug 15 10:45:12 2012) 
  has 666 multiply-claimed block(s), shared with 1 file(s):
        /O/0/d11 (inode #30900230, mod time Thu Aug 16 17:49:45 2012)
Delete file? yes

delete_file_block: internal error: can't find dup_blk for 7910459945

File ??? (inode #14123015, mod time Wed Aug 15 10:27:33 2012) 
  has 648 multiply-claimed block(s), shared with 1 file(s):
        /O/0/d12 (inode #30900231, mod time Thu Aug 16 17:49:45 2012)
Delete file? yes

delete_file_block: internal error: can't find dup_blk for 7910459968

File ??? (inode #14123016, mod time Wed Aug 15 10:45:12 2012) 
  has 657 multiply-claimed block(s), shared with 1 file(s):
        /O/0/d13 (inode #30900232, mod time Thu Aug 16 17:49:45 2012)
Delete file? yes

delete_file_block: internal error: can't find dup_blk for 7910459957












Professor Samuel Aparicio BM BCh PhD FRCPath
Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
675 West 10th, Vancouver V5Z 1L3, Canada.
office: +1 604 675 8200 lab website http://molonc.bccrc.ca





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20120816/cfafa4b3/attachment.htm>


More information about the lustre-discuss mailing list