[Lustre-discuss] Recovery from Hardware Failure

Cliff White cliffw at whamcloud.com
Mon Feb 7 15:00:47 PST 2011


You should not have to do the lfsck if the initial fsck's come back clean.
cliffw


On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio <jgd-lustre at metajoe.com> wrote:

> Last week we experienced a major hardware failure (disk controller)
> that brought down our system hard.  Now that I have the replacement
> controller, I want to make sure I recover correctly.  Below is the
> procedure I plan to follow based on what I've gathered from the
> Operations Manual.
>
> Any comments?
> Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs?
>
> Thanks!
> -Joe
>
>
> ###MDT Recovery
> # Capture fs state before doing anything
> e2fsck -vfn /dev/$MDTDEV
> # "safe" repair
> e2fsck -vfp /dev/$MDTDEV
> # Verify no more problems and generate mdsdb
> e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV
>
> ###OST Recovery
> foreach OST
>    # Capture fs state before doing anything
>    e2fsck -vfn /dev/$OSTDEV
>    # "safe" repair
>    e2fsck -vfp /dev/$OSTDEV
>    # Verify no more problems
>    e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV
>
> ### Recover lost+found Objects
> foreach OST
>    mount -t ldiskfs /dev/$OSTDEV /mnt/ost
>    ll_recover_lost_found_objs -v -d /mnt/ost/lost+found
>
> ### Coherency Check
> lfsck -n -v --mdsdb /tmp/mdsdb --ostdb
> /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110207/ebaa95ff/attachment.htm>


More information about the lustre-discuss mailing list