Testing Hash Functions Using Dieharder
General:
A while back one of my patches was accepted into dieharder
that makes it easier to test hash functions. Just write a program
that feeds incrementally larger blocks of a constant value to your
hash function. Your program should write each hash value to
stdout in binary format and pipe the result to dieharder as
follows:
generate_hash_values | dieharder -a -g 200
Advice to Hash Function Authors:
I'm not competent to give advice about the inner workings of a
hash function, but I think I can offer some sound advice about the
outer workings. Mainly, make sure your hash function is testable,
and this largely boils down to making sure your hash function can
generate a large number of values in linear time when it is fed a
stream of constants. If each hash value has to be recalculated
from scratch, its O(N^2) which is too slow for the large number of
values consumed by dieharder.
I think it is instructive to review the changes Austin Appleby
applied to MurmurHash
2 (which cannot be tested with dieharder) to produce MurmurHash
2A (which can be tested with dieharder). I would also like to
take this opportunity to thank Austin for the e-mail exchange we
had. For non-experts like myself, picking a hash function is
largely a question of whether you trust the author, and the speed
and accuracy with which he produced MurmurHash 2A has informed my
decision (which is as close as I can come to endorsing something
as nebulous as a hash function). In the spirit of equal time, I'm
also enamored of the Jenkins One-at-a-Time hash (and really any
hash function that has no dieharder failures). I would pick
whichever is faster (or fastest) for your input, or you might want
to run the dieharder tests for yourself as there are some minor
differences in results.
Hash Function Results:
Source Code: