[Lustre-devel] async write and abort_recov
John Hammond
jhammond at ices.utexas.edu
Thu Jul 15 11:57:51 PDT 2010
On 07/15/2010 11:19 AM, Andreas Dilger wrote:
> On 2010-07-15, at 02:05, Aurelien Degremont wrote:
>> Andreas Dilger a écrit :
>>> While I know Lustre will save errors from async write RPCs into
>>> the file descriptor for later write calls or fsync), I don't know
>>> if we save any IO error into the file descriptor if we discard
>>> pages due to eviction. I think only errors due to currently
>>> in-flight RPCs that are aborted due to client eviction are
>>> returned.
If the async write fails due to eviction then writepage() will store
-ESHUTDOWN in the inode info's lli_async_rc member.
>> Sounds like a bug to me? That means, if a process write data on a
>> client, those data goes to page cache. Not yet to OST if there is
>> no local memory pressure. At that moment, if the client is evicted,
>> those pages are dropped. Then client reconnect, the process writes
>> other data. Those I/O are successful, client has missed that some
>> previous I/O failed?
I filed a bug because the async errors weren't being reported, see
https://bugzilla.lustre.org/show_bug.cgi?id=22360. It looks like this
is addressed in 1.8.4. Thereafter they should be reported on the next
call to close() for that inode; but note that the error need not go to
the processes whose writes were lost. Tant pis!
--
John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304
More information about the lustre-devel
mailing list