[Lustre-discuss] async journals

Fri Dec 18 12:02:12 PST 2009

On 2009-12-17, at 15:25, Bernd Schubert wrote:
> I'm presently a bit puzzled about asynchronous journal patches.
>
> While I was just reading the jbd-journal-chksum-rhel53.patch patch,  
> I noticed it also adds a new option and feature  
> "journal_async_commit".
>
> But then ever since lustre-1.8.0 there is also a patch included for  
> async
> journals from obdfilter. This patch is presently disabled, since it  
> could
> cause data corruption on fail over.
>
> I now wonder how these two patches/features are related, so jbd/ 
> ldiskfs/ext4
> (journal_async_commit vs. obdfilter (obdfilter.*.sync_journal=0).

These are two completely independent changes, though they have  
confusingly similar names.

The jbd-journal-checksum patch adds in a generic feature to the JBD  
transaction commit which avoids one synchronous operation per  
transaction commit.  Originally, JBD would need to write out all of  
the transaction data blocks, sync them to disk, then submit the  
transaction commit block and sync it to disk before the transaction  
could be considered as committed.

The addition of a checksum to the transaction commit block allows the  
journal replay code to determine itself whether all of the transaction  
data blocks were committed to disk before the commit block, and allow  
or deny the transaction during journal replay based on that.  If the  
journal async_commit feature is enabled on the journal, then it will  
skip this pre-commit-block sync.  However, this is not enabled by  
default, since there was a problem found in the upstream ext4 code due  
to blocks being modified outside of the journal and causing checksum  
failures during replay.

The Lustre async journal commit is essentially adding support for  
Lustre clients to be able to submit write requests to the OST, but not  
require the server to do that IO synchronously.  Instead, the client  
will keep a copy of the data in cache until it gets a commit  
notification from the OST, and will rewrite the data if the OST  
crashes.  This allows a single client to submit a large number of  
writes without having to commit the journal transaction.  This feature  
is still under testing, so it is not enabled by default, and a bug was  
fixed in 1.8.2 related to recovery.

> When I did some tests with lctl set_param  
> obdfilter.*.sync_journal=0, it even slightly reduced performance. So  
> I wonder if one additionally needs to enable jbd-async journals?

It is only expected that this will improve performance when a small  
number of clients is doing IO to each OST.  Once there are many  
clients doing IO at the same time, there is enough IO per transaction  
that the commit does not noticably affect performance.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.