[Lustre-discuss] Recovery from Hardware Failure
Joe Digilio
jgd-lustre at metajoe.com
Mon Feb 7 13:16:06 PST 2011
Last week we experienced a major hardware failure (disk controller)
that brought down our system hard. Now that I have the replacement
controller, I want to make sure I recover correctly. Below is the
procedure I plan to follow based on what I've gathered from the
Operations Manual.
Any comments?
Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs?
Thanks!
-Joe
###MDT Recovery
# Capture fs state before doing anything
e2fsck -vfn /dev/$MDTDEV
# "safe" repair
e2fsck -vfp /dev/$MDTDEV
# Verify no more problems and generate mdsdb
e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV
###OST Recovery
foreach OST
# Capture fs state before doing anything
e2fsck -vfn /dev/$OSTDEV
# "safe" repair
e2fsck -vfp /dev/$OSTDEV
# Verify no more problems
e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV
### Recover lost+found Objects
foreach OST
mount -t ldiskfs /dev/$OSTDEV /mnt/ost
ll_recover_lost_found_objs -v -d /mnt/ost/lost+found
### Coherency Check
lfsck -n -v --mdsdb /tmp/mdsdb --ostdb
/tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre
More information about the lustre-discuss
mailing list