[Lustre-discuss] disappeared data from OST

Peter Grandi pg_lus at lus.for.sabi.co.UK
Mon Feb 15 15:30:48 PST 2010


> After a power spike this weekend that crashed several machines
> (not the OSS'es...) and/or possibly hitting 100% file space
> usage on one of them (we have been dangerously close for a
> while), it hung this morning.

That's fairly clear, but did you do any checks as to whether all
the drives involved are entirely error free? How do you know
your storage system is still good to use?

Also did you have battery backup for at least the storage HAs?

> After restarting, it showed many files as missing. [ ... ]
> Now I am afraid that if I carry on (probably just cycling the
> power, since "reboot" also hangs), it will come back in the
> same state, i.e. 95% of the data gone. Is this already
> irreparably the case, or am I just paranoid?  Any suggestions
> would be appreciated (in other words: HELP!!!!).

There is one simple solution: restore backups. That's what they
are for, situations like this. It is probably much faster than
any attempt at recovery, if the backups are on suitable media.
I think that in many cases restoring from backup is faster than
running 'fsck' over damaged filesystems.

As to that, I reckon that it is often little appreciated that
the most cost effective way to backup efficiently a large Lustre
storage pool may be another Lustre storage pool, and Lustre can
make pretty good backup servers (excellent sequential write
rates from cheap low IOPS drives, over Ethernet).



More information about the lustre-discuss mailing list