[Lustre-devel] Checksum Algorithm
Andreas Dilger
adilger at sun.com
Wed Nov 7 12:05:20 PST 2007
On Nov 06, 2007 11:59 -0500, RS RS wrote:
> We have seen a huge performance drop in 1.6.3, due to the checksum
> being enabled by default. I looked at the algorithm being used, and it is
> actually a CRC32, which is a very strong algorithm for detecting all sorts
> of problems, such as single bit errors, swapped bytes, and missing bytes.
> I've been experimenting with using a simple XOR algorithm. I've
> been able to recover most of the lost performance. This algorithm
> will detected corrupted bytes and words. This algorithm will not
> detect swapped bytes errors, but I think that these are pretty rare.
> This algorithm will not detect missing bytes, but I suspect that other
> things in Lustre or LNET will detect this problem. This algorithm will
> not detect two errors that offset each other, such as a single bit error
> in two words that are a multiple of 4 bytes apart.
Note that it is possible to disable checksums to get the previous behaviour
back at runtime with (on all clients that should skip checksums):
for C in /proc/fs/lustre/osc/*/checksums; do
echo 0 > $C
done
in the lustre configuration:
mgs> lctl conf_param testfs-OST0001.osc.checksums=0
or at compile time with "configure --disable-checksum ..."
Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-devel
mailing list