[Lustre-devel] Checksum Algorithm

RS RS rs2006ts at hotmail.com
Tue Nov 6 08:59:26 PST 2007


We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default.  I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes.

I've been experimenting with using a simple XOR algorithm.  I've been able to recover most of the lost performance.  This algorithm will detected corrupted bytes and words.  This algorithm will not detect swapped bytes errors, but I think that these are pretty rare.  This algorithm will not detect missing bytes, but I suspect that other things in Lustre or LNET will detect this problem.  This algorithm will not detect two errors that offset each other, such as a single bit error in two words that are a multiple of 4 bytes apart.

Should we consider using a more efficient checksum algorithm, in order to regain performance?  Should the algorithm be configurable?  


Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20071106/30a2c755/attachment.htm>

More information about the lustre-devel mailing list