[Lustre-discuss] Checksums of files on disk

Oleg Drokin green at whamcloud.com
Wed May 25 14:10:09 PDT 2011


Hello!

On May 25, 2011, at 6:26 AM, Christopher J.Walker wrote:

> The application I use, StoRM[1] can store checksums on disk in an
> extended user attribute - and use that to ensure the integrity of files
> on disk. The algorithm currently used is adler32. The intention is to
> perform end to end checksumming from file creation through storage,
> transfer over the WAN and storage at a site.
> 
> I see that Lustre has some checksum support (though not for checksumming
> the file on the OST - so we'd still need to use the user attribute for
> that).
> Is the value of the checksum user accessible? Or to be more specific,
> I'd potentially get a big speedup if I were able to ask the diskserver
> to tell me the checksum of a file without actually transferring it over
> the network. Is it easy to do this?

Note that Lustre checksumming is on the wire and it's not the entire file
or object checksum, only the currently on the wire data is checksummed.

Moreover, the clients have no way to know what the checksum of the entire
file would be unless they read the entire thing in to perform the calculation.

> [1] http://storm.forge.cnaf.infn.it/home This is an SRM implementation
> we use to give an grid authentication to our storage (we store data for
> the LHC).

I had a cursory look and don't see much discussion about checksum
implementations.
I hope you have finer granularity than just "file" or "object".
Some arbitrary-size block would have been good I think.

(BTW, on the first page there it says "Lustre by Sun Microsystems"
which is now somewhat stale.)

Bye,
    Oleg
--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.


More information about the lustre-discuss mailing list