[Lustre-devel] async write and abort_recov

John Hammond jhammond at ices.utexas.edu
Thu Jul 15 11:57:51 PDT 2010


On 07/15/2010 11:19 AM, Andreas Dilger wrote:
> On 2010-07-15, at 02:05, Aurelien Degremont wrote:
>> Andreas Dilger a écrit :
>>> While I know Lustre will save errors from async write RPCs into
>>> the file descriptor for later write calls or fsync), I don't know
>>> if we save any IO error into the file descriptor if we discard
>>> pages due to eviction.  I think only errors due to currently
>>> in-flight RPCs that are aborted due to client eviction are
>>> returned.

If the async write fails due to eviction then writepage() will store 
-ESHUTDOWN in the inode info's lli_async_rc member.

>> Sounds like a bug to me?  That means, if a process write data on a
>> client, those data goes to page cache.  Not yet to OST if there is
>> no local memory pressure. At that moment, if the client is evicted,
>> those pages are dropped. Then client reconnect, the process writes
>> other data. Those I/O are successful, client has missed that some
>> previous I/O failed?

I filed a bug because the async errors weren't being reported, see 
https://bugzilla.lustre.org/show_bug.cgi?id=22360.  It looks like this 
is addressed in 1.8.4.  Thereafter they should be reported on the next 
call to close() for that inode; but note that the error need not go to 
the processes whose writes were lost.  Tant pis!

-- 
John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304



More information about the lustre-devel mailing list