[Lustre-discuss] csum errors

Stuart Midgley sdm900 at gmail.com
Thu Aug 28 06:49:52 PDT 2008


for completeness, here are the logs from 172.16.4.93

Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE  
CHECKSUM: changed on the client after we checksummed it - likely false  
positive due to mmap IO (bug 11742): from 172.16.0.25 at tcp inum  
24522277/1605841060 object 12021/0 extent [10485760-11534335]
Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 
1162:check_write_checksum()) original client csum 2dbc1696 (type 2),  
server csum 9d081697 (type 2), client csum now 9d081697
Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: 
1372:osc_brw_redo_request()) @@@ redo for recoverable error   
req at ffff81012c434600 x4720217/t820873 o4->p1- 
OST0018_UUID at 172.16.0.25@tcp:6/4 lens 384/480 e 0 to 100 dl 1219794694  
ref 2 fl Interpret:R/0/0 rc 0/0


-- 
Dr Stuart Midgley
sdm900 at gmail.com



On 28/08/2008, at 11:57 AM, Stuart Midgley wrote:

> We recently upgraded from 1.4.10.1 to 1.6.5.1 (clients and servers)  
> and now we are seeing errors like
>
>
> Aug 27 07:49:54 oss025 kernel: LustreError: 3738:0:(ost_handler.c: 
> 1163:ost_brw_write()) client csum 2dbc1696, server csum 9d081697
> Aug 27 07:49:54 oss025 kernel: LustreError: 168-f: p1-OST0018: BAD  
> WRITE CHECKSUM: changed in transit before arrival at OST from  
> 12345-172.16.4.93 at tcp inum 24522277/426969871 object 12021/0 extent  
> [10485760-11534335]
> Aug 27 07:49:55 oss025 kernel: LustreError: 3738:0:(ost_handler.c: 
> 1225:ost_brw_write()) client csum 2dbc1696, original server csum  
> 9d081697, server csum now 9d081697
>
>
> always from the same cluster node...  Should we be worried?  I  
> suspect this means we shouldn't turn check summing off?  I assume  
> these are rejected and resent from the client?
>
>
> -- 
> Dr Stuart Midgley
> sdm900 at gmail.com
>
>
>




More information about the lustre-discuss mailing list