[Lustre-discuss] filesystem corruption
Kevin Van Maren
Kevin.Vanmaren at Sun.COM
Sun Sep 6 07:44:50 PDT 2009
This subject has been discussed many times...
Not just the controller, but the drives as well.
The problem is with write-back caches that _lie_ about the data being in
persistent store. The drive itself, with write-back cache enabled, lies
and says data is on disk. RAID controllers likewise use write-back
cache to lie about the data being on disk.
So why do they lie? Because it makes the operating system run faster,
as it doesn't have to wait as long for the data to be "on disk".
What is the problem? The reason the OS is waiting for the data to be
"on disk" is to ensure consistency of the filesystem. If the
controller/drive says the data is in persistent store, but it is not
actually there, and the system loses power/crashes/experiences some
other problem, then when the filesystem comes up things aren't in a
consistent state.
With ext3, the journal is used to ensure the filesystem is recoverable
-- assuming the controller does not lie -- even if the outstanding
writes do not complete. So while there may be loss of data, the
filesystem is not mangled due to a hard crash. [Journaling is only one
of many approaches taken over the years to improve performance; see also
Kirk McKusick's work on soft updates for the BSD FFS filesystem --
http://www.ece.cmu.edu/~ganger/papers/mckusick99.pdf]
Note that write-back caches do not always lie about being in stable
storage -- _some_ HW RAID controllers do have special features to turn
the controller cache into non-volatile storage, with mirrored write
cache and battery backup. Battery backup makes it less likely it is
lying, at least until the system loses power for several days and the
battery dies.
Kevin
Mag Gam wrote:
> So, I have to ask
>
> Why disable write-back cache on the controller?
>
>
>
> 2009/9/4 恩强周 <eqzhou at gmail.com>:
>
>> It's really dangerous! e2fsck bring it back.
>>
>> 2009/9/3 Peter Kjellstrom <cap at nsc.liu.se>
>>
>>> On Thursday 03 September 2009, 恩强周 wrote:
>>>
>>>> hi all,
>>>>
>>>> I have lustre corrupted when a OSS powered off by ipmi accidentally. I
>>>> get
>>>> followling messages after that OSS restart.
>>>>
>>> ...
>>>
>>>> Our OSS server is running lustre-1.8.0,equipped with areca RAID adapter
>>>> with write cache enabled.
>>>>
>>> Drive write-back cache is dangerous, controller write-back cache is
>>> dangerous
>>> if you don't have a battery backup unit on the card. Which group are you
>>> in?
>>>
>>> Either way the next step is probably fsck while keeping your fingers
>>> crossed.
>>>
>>> /Peter
>>>
>>>
>>>> I have a worry about data on lustre maybe lost.
>>>> And waht's the cause of such a problem?How can I fixed it?
>>>>
>>>> Thanks in advance!
>>>>
>>> _______________________________________________
More information about the lustre-discuss
mailing list