[Lustre-devel] Checksum Algorithm
RS RS
rs2006ts at hotmail.com
Tue Nov 6 08:59:26 PST 2007
Hi,
We have seen a huge performance drop in 1.6.3, due to the checksum being enabled by default. I looked at the algorithm being used, and it is actually a CRC32, which is a very strong algorithm for detecting all sorts of problems, such as single bit errors, swapped bytes, and missing bytes.
I've been experimenting with using a simple XOR algorithm. I've been able to recover most of the lost performance. This algorithm will detected corrupted bytes and words. This algorithm will not detect swapped bytes errors, but I think that these are pretty rare. This algorithm will not detect missing bytes, but I suspect that other things in Lustre or LNET will detect this problem. This algorithm will not detect two errors that offset each other, such as a single bit error in two words that are a multiple of 4 bytes apart.
Should we consider using a more efficient checksum algorithm, in order to regain performance? Should the algorithm be configurable?
-Roger
_________________________________________________________________
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20071106/30a2c755/attachment.htm>
More information about the lustre-devel
mailing list