[Lustre-discuss] filesystem corruption

Sun Sep 6 07:44:50 PDT 2009

This subject has been discussed many times...

Not just the controller, but the drives as well.

The problem is with write-back caches that _lie_ about the data being in 
persistent store.  The drive itself, with write-back cache enabled, lies 
and says data is on disk.  RAID controllers likewise use write-back 
cache to lie about the data being on disk.

So why do they lie?  Because it makes the operating system run faster, 
as it doesn't have to wait as long for the data to be "on disk".

What is the problem?  The reason the OS is waiting for the data to be 
"on disk" is to ensure consistency of the filesystem.   If the 
controller/drive says the data is in persistent store, but it is not 
actually there, and the system loses power/crashes/experiences some 
other problem, then when the filesystem comes up things aren't in a 
consistent state.

With ext3, the journal is used to ensure the filesystem is recoverable 
-- assuming the controller does not lie -- even if the outstanding 
writes do not complete.  So while there may be loss of data, the 
filesystem is not mangled due to a hard crash.  [Journaling is only one 
of many approaches taken over the years to improve performance; see also 
Kirk McKusick's work on soft updates for the BSD FFS filesystem -- 
http://www.ece.cmu.edu/~ganger/papers/mckusick99.pdf]

Note that write-back caches do not always lie about being in stable 
storage -- _some_ HW RAID controllers do have special features to turn 
the controller cache into non-volatile storage, with mirrored write 
cache and battery backup.  Battery backup makes it less likely it is 
lying, at least until the system loses power for several days and the 
battery dies.

Kevin

Mag Gam wrote:
> So, I have to ask
>
> Why disable write-back cache on the controller?
>
>
>
> 2009/9/4 恩强周 <eqzhou at gmail.com>:
>   
>> It's really dangerous! e2fsck  bring it back.
>>
>> 2009/9/3 Peter Kjellstrom <cap at nsc.liu.se>
>>     
>>> On Thursday 03 September 2009, 恩强周 wrote:
>>>       
>>>> hi all，
>>>>
>>>> I have lustre  corrupted when a OSS powered off by ipmi accidentally. I
>>>> get
>>>> followling messages after that OSS restart.
>>>>         
>>> ...
>>>       
>>>> Our OSS server is running lustre-1.8.0,equipped with areca RAID adapter
>>>> with write cache enabled.
>>>>         
>>> Drive write-back cache is dangerous, controller write-back cache is
>>> dangerous
>>> if you don't have a battery backup unit on the card. Which group are you
>>> in?
>>>
>>> Either way the next step is probably fsck while keeping your fingers
>>> crossed.
>>>
>>> /Peter
>>>
>>>       
>>>> I have a worry about data on lustre  maybe lost.
>>>> And waht's the cause of such a problem?How can I fixed it?
>>>>
>>>> Thanks in advance!
>>>>         
>>> _______________________________________________