[Lustre-discuss] ost_brw_write()

Andreas Dilger adilger at sun.com
Wed Nov 12 13:32:44 PST 2008


On Nov 12, 2008  08:10 -0500, Brian J. Murrell wrote:
> On Wed, 2008-11-12 at 07:17 -0500, Mag Gam wrote:
> > We noticed.
> > LustreError: 132-0: BAD WRITE CHECKSUM: changed in transit before
> > arrival at OST: from 192.168.0.3 at tcp inum (somenumber)/(somenumber)
> > object (some number)/0 extend [0-4095]
> > 
> > Its actually coming from 2 particular hosts (1 OSS) another from 1
> > particular client.
> > 
> > I also see @@@ redo for unrecoverable error req at fff8xxxxxxxxxxxxxxxxxxxx
> > 
> > Any thoughts how can I get rid of these messages?
> 
> Assuming it's not a bug in Lustre, fix whatever is mangling the data
> before it arrives at the OST.  Do you have errors on your networking
> fabric, or on the interfaces of the hosts on either end of the
> transaction?

Note that a similar error can also happen in the case of an application
doing mmap IO, which the Linux kernel does not prevent from modifying
the page even while it is being RDMA'd over the network, so it is hard
for Lustre to provide a checksum for.

The client would have printed a message like the following in that case:

	"BAD WRITE CHECKSUM: changed in transit AND doesn't match the
	 original - likely false positive due to mmap IO (bug 11742)"

If the client's copy of the data has not changed, and the checksum
is still correct, then it points to data corruption on the network
(probably in the NIC itself if it is specific to one node).

Note that since the NIC is doing the TCP checksumming itself, this kind
of error won't be caught by TCP packet checksums because the data is
already corrupted in the NIC memory before the TCP checksum is computed.

This specific problem was actually hit by a customer and is one of the
reasons why Lustre does its own data checksum, instead of depending on
the TCP layer to deliver the data without any errors.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list