[Lustre-devel] async write and abort_recov

John Hammond jhammond at ices.utexas.edu
Thu Jul 15 11:57:51 PDT 2010

On 07/15/2010 11:19 AM, Andreas Dilger wrote:
> On 2010-07-15, at 02:05, Aurelien Degremont wrote:
>> Andreas Dilger a écrit :
>>> While I know Lustre will save errors from async write RPCs into
>>> the file descriptor for later write calls or fsync), I don't know
>>> if we save any IO error into the file descriptor if we discard
>>> pages due to eviction.  I think only errors due to currently
>>> in-flight RPCs that are aborted due to client eviction are
>>> returned.

If the async write fails due to eviction then writepage() will store 
-ESHUTDOWN in the inode info's lli_async_rc member.

>> Sounds like a bug to me?  That means, if a process write data on a
>> client, those data goes to page cache.  Not yet to OST if there is
>> no local memory pressure. At that moment, if the client is evicted,
>> those pages are dropped. Then client reconnect, the process writes
>> other data. Those I/O are successful, client has missed that some
>> previous I/O failed?

I filed a bug because the async errors weren't being reported, see 
https://bugzilla.lustre.org/show_bug.cgi?id=22360.  It looks like this 
is addressed in 1.8.4.  Thereafter they should be reported on the next 
call to close() for that inode; but note that the error need not go to 
the processes whose writes were lost.  Tant pis!

John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304

More information about the lustre-devel mailing list