[Lustre-discuss] bad csum errors

John White jwhite at lbl.gov
Tue Sep 28 06:30:36 PDT 2010


Hello Folks,
	Recently we've had a fair number of messages akin to the following coming from out OSS syslog:
n0004: LustreError: 168-f: lrc-OST0002: BAD WRITE CHECKSUM: changed in transit before arrival at OST from 12345-10.4.8.194 at o2ib inum 1409775/2324736913 object 1771080/0 extent [401408-2809855]
n0004: LustreError: Skipped 13 previous similar messages
n0004: LustreError: 10839:0:(ost_handler.c:1169:ost_brw_write()) client csum ae09a542, original server csum cfb6ab4b, server csum now cfb6ab4b

There appear to be no specific clients, OSSs or OSTs in common.  We'll commonly get a block of messages concerning one OST w/ different clients involved and then move on to another OST.  As such, I'm doubting this is a memory issue.  Previous mails on this list mention MMAP, but there doesn't seem to be any mention in these messages.  Ideas?

----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720




More information about the lustre-discuss mailing list