[Lustre-devel] Checksum Algorithm
rs2006ts at hotmail.com
Tue Nov 6 08:59:26 PST 2007
We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default. I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes.
I've been experimenting with using a simple XOR algorithm. I've been able to recover most of the lost performance. This algorithm will detected corrupted bytes and words. This algorithm will not detect swapped bytes errors, but I think that these are pretty rare. This algorithm will not detect missing bytes, but I suspect that other things in Lustre or LNET will detect this problem. This algorithm will not detect two errors that offset each other, such as a single bit error in two words that are a multiple of 4 bytes apart.
Should we consider using a more efficient checksum algorithm, in order to regain performance? Should the algorithm be configurable?
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-devel