[Lustre-discuss] Recovery from Hardware Failure

Joe Digilio jgd-lustre at metajoe.com
Fri Feb 11 13:06:33 PST 2011


Cliff, thank you for your help so far.

Unfortunately, the initial e2fsck's (-n) of both the MDT and the OSTs
did not come back clean.  Using "-p", the OSTs cleaned up nicely (in
fact, most OST problems went away after the journal was recovered).
The MDT had many files dumped to lost+found.

When I run "lfsck -d" it never seems to delete the orphans...
Subsequent runs show exactly the same orphans.  Many lines like this
for all OSTs:
[0] zero-length orphan objid 1
[0] zero-length orphan objid 960
[0] zero-length orphan objid 992
lfsck: [0]: pass3 orphan found objid 1207392, 6234112 bytes
lfsck: [0]: pass3 orphan found objid 1207360, 6234112 bytes

Shouldn't those be deleted when using "-d"?  Or am I misunderstanding
the documentation?

Thanks again!
-Joe


On Mon, Feb 7, 2011 at 17:00, Cliff White <cliffw at whamcloud.com> wrote:
> You should not have to do the lfsck if the initial fsck's come back clean.
> cliffw
>
> On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio <jgd-lustre at metajoe.com> wrote:
>>
>> Last week we experienced a major hardware failure (disk controller)
>> that brought down our system hard.  Now that I have the replacement
>> controller, I want to make sure I recover correctly.  Below is the
>> procedure I plan to follow based on what I've gathered from the
>> Operations Manual.
>>
>> Any comments?
>> Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs?
>>
>> Thanks!
>> -Joe
>>
>>
>> ###MDT Recovery
>> # Capture fs state before doing anything
>> e2fsck -vfn /dev/$MDTDEV
>> # "safe" repair
>> e2fsck -vfp /dev/$MDTDEV
>> # Verify no more problems and generate mdsdb
>> e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV
>>
>> ###OST Recovery
>> foreach OST
>>    # Capture fs state before doing anything
>>    e2fsck -vfn /dev/$OSTDEV
>>    # "safe" repair
>>    e2fsck -vfp /dev/$OSTDEV
>>    # Verify no more problems
>>    e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV
>>
>> ### Recover lost+found Objects
>> foreach OST
>>    mount -t ldiskfs /dev/$OSTDEV /mnt/ost
>>    ll_recover_lost_found_objs -v -d /mnt/ost/lost+found
>>
>> ### Coherency Check
>> lfsck -n -v --mdsdb /tmp/mdsdb --ostdb
>> /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list