[Lustre-devel] Fwd: Disk rebuild

Andreas Dilger adilger at sun.com
Wed Dec 2 11:43:04 PST 2009

Hi Nikita!

On 2009-12-02, at 07:13, Nikita Danilov wrote:
>> On Tue, Dec 01, 2009 at 06:00:39PM +0300, Nikita Danilov wrote:
>>> what is the status of this? Is ext3 guided resync code (RHEL 5  
>>> version was posted on lkml in October) used by Lustre?
>> This is covered by bug 19932.
> The last (43rd) comment there is rather intriguing. Can you elaborate
> on why guided resync cannot work with the Lustre IO stack?

The problem lies in the way that obdfilter submits IO.  Since it is  
not using the normal buffer cache to track "data=ordered" (or in the  
case of this patch "data=declared") mode the bio_submit() will likely  
start modifying the MD device before the corresponding declare blocks  
are committed to the journal.

This breaks the whole validity of declared mode in case of a crash,  
since we can no longer be certain that the declare blocks contain all  
of the locations in the MD RAID that may need to have parity rebuilt.

It would be possible to fix this by having the OST use the normal VFS  
methods to order the IO to disk, but I'm sure you're well aware of the  
performance impact of this.  It wouldn't be so bad with older versions  
of Lustre, where we had to wait for the journal commit before  
returning to the client anyway, but in 1.8.2 there is a (disabled by  
default) async journal commit option that allows the client to get RPC  
replies before the bulk IO is committed.

In order to accommodate declared mode it mean that we need to  
implement full write-cached IO on the OST, which wouldn't be  
impossible given that 1.8 already uses the page cache for reading, but  
given the amount of change and risk this would introduce it wasn't  
thought worthwhile to implement for the short lifespan it would have.   
It wouldn't be practical to introduce such a major change any sooner  
than the DMU OSD in the 2.1 release, at which point it is largely  

Cheers, Andreas
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

More information about the lustre-devel mailing list