[Lustre-discuss] osc_brw_redo_request error on clients

Oleg Drokin green at whamcloud.com
Wed Feb 9 16:34:08 PST 2011


Hello!

On Feb 9, 2011, at 7:24 PM, James Robnett wrote:

>> Normally I've had no problems but recently I have multiple clients
>> reporting the following error:
>> 
>> LustreError: 3935:0:(osc_request.c:1629:osc_brw_redo_request()) @@@ redo
>> for recoverable error  req at ffff8101ae084000 x1358858531428366/t60136289752
>> o4->lustre-OST0004_UUID at 192.168.1.12@o2ib:6/4 lens 448/608 e 0 to 1 dl
>> 1297285890 ref 2 fl Interpret:R/0/0 rc 0/0
>> 
>> which in turn appears to generate a premature EOF on our user software.
>> 
>> There are no corresponding errors on the servers.
>   The above is not true.  There are apparently corresponding errors of
> the form:
> Feb  9 17:05:21 lustre-oss-1 kernel: LustreError:
> 2964:0:(ost_handler.c:1038:ost_brw_write()) client csum f00001, server
> csum 964d53e2
> Feb  9 17:05:21 lustre-oss-1 kernel: LustreError:
> 2964:0:(ost_handler.c:1038:ost_brw_write()) Skipped 43 previous similar
> messages
>   The other OSS shows similar errors.  We are doing mmap I/O and a
> search implies those errors are related to mmap I/O.

Ok, so this is it. The mmap code changes the page before it is sent therefore failing the crc check and causes the resent
(I am a bit surprised there is no crc error in the client logs, though).

>   I'm open to suggestions, in the meantime the userspace code can be
> switched from mmap to regular file I/O via an rc file so we'll try that
> and see if it at least makes the errors go away.

Well, your options are to disable mmap in the code (Lustre mmap code is not super fast,
so if that's a real option, give it a try and you might find that it speeds up everything too)
or you can disable checksum checking.

I also did some more digging and in fact there was a patch included in 1.8.4 that essentially makes
retries to be done only once and in fact ignore subsequent errors, so there still should be no user-visible failures.
(the patch is from bug 11742, there the last comments in fact references the messages just like you see on the client,
but no ill effect from them).

Bye,
    Oleg


More information about the lustre-discuss mailing list