[Lustre-discuss] bad csum errors

Kevin Van Maren kevin.van.maren at oracle.com
Tue Sep 28 07:13:57 PDT 2010


https://bugzilla.lustre.org/show_bug.cgi?id=11742

Kevin


John White wrote:
> Hello Folks,
> 	Recently we've had a fair number of messages akin to the following coming from out OSS syslog:
> n0004: LustreError: 168-f: lrc-OST0002: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-10.4.8.194 at o2ib inum 1409775/2324736913 object 1771080/0 extent [401408-2809855]
> n0004: LustreError: Skipped 13 previous similar messages
> n0004: LustreError: 10839:0:(ost_handler.c:1169:ost_brw_write()) client csum ae09a542, original server csum cfb6ab4b, server csum now cfb6ab4b
>
> There appear to be no specific clients, OSSs or OSTs in common.  We'll commonly get a block of messages concerning one OST w/ different clients involved and then move on to another OST.  As such, I'm doubting this is a memory issue.  Previous mails on this list mention MMAP, but there doesn't seem to be any mention in these messages.  Ideas?
>
> ----------------
> John White
> High Performance Computing Services (HPCS)
> (510) 486-7307
> One Cyclotron Rd, MS: 50B-3209C
> Lawrence Berkeley National Lab
> Berkeley, CA 94720
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list