[Lustre-discuss] Serious error: objid already exists; is this filesystem corrupt?

Chris Walker cwalker at fas.harvard.edu
Thu Nov 4 20:04:53 PDT 2010


Thanks very much to both of you -- the manual method worked perfectly.

Best,
Chris

On 11/4/10 8:35 AM, Bernd Schubert wrote:
> Hello Christopher, hello Alex,
>
> the alternative is to let e2fsck correct LAST_ID. Patches are here:
>
> https://bugzilla.lustre.org/show_bug.cgi?id=22734
>
> and included in our e2fsprogs releases:
>
> http://eu.ddn.com:8080/lustre/lustre/RHEL5/tools/e2fsprogs/
>
> Unfortunately, the patches are not yet in Oracle e2fsprogs version.
>
> In order to let e2fsck correct it, you will need to create an mdsdb file (the
> hdr is sufficient) and then
> e2fsck --mdsdb mdsdb.hdr --ostdb some_irrelevant_file  /dev/device
>
> The procedure is similar to the lfsck preparations, although one usually runs
> that with "-n". To let e2fsck (pass6, the db-part) correct the LAST_ID, it
> must *not* run in read-only mode, though.
>
>
> Cheers,
> Bernd
>
>
> On Thursday, November 04, 2010, Alexey Lyashkov wrote:
>> Hi Christopher,
>>
>> you need kill lov_objid file on MDS and set LAST_ID on OST to  870397.
>> in that case MDS will reread last_id from OST's and refill lov_objid file,
>> to avoid possible file corruption.
>>
>> On Nov 4, 2010, at 04:22, Christopher Walker wrote:
>>> We recently had a hardware failure on one of our OSTs, which has caused
>>> some major problems for our 1.6.6-based array.
>>>
>>> We're now getting the error:
>>>
>>> Serious error: objid 517386 already exists; is this filesystem corrupt?
>>>
>>> on one of our OSTs.  If I mount this OST as ldiskfs and look in O/0/d*,
>>> the highest objid I see is 870397, considerably higher than 517386.
>>> We've taken this OST through a round of e2fsck
>>> and ll_recover_lost_found_objs, during which it restored a lot of lost
>>> files, and e2fsck on this OST and on the MDT don't currently show any
>>> problems.  Can I simply edit O/0/LAST_ID, set it to 870397, and expect
>>> files with objid between 517386 and 870397 to come back?
>>>
>>> Also, I could be wrong, but it looks like ll_recover_lost_found_objs.c
>>> only looks for lost files up to LAST_ID -- if I reset LAST_ID to 870397,
>>> should I rerun ll_recover_lost_found_objs?
>>>
>>> Many thanks in advance,
>>> Chris
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>




More information about the lustre-discuss mailing list