Testing Hash Functions Using Dieharder

General:

A while back one of my patches was accepted into dieharder that makes it easier to test hash functions. Just write a program that feeds incrementally larger blocks of a constant value to your hash function. Your program should write each hash value to stdout in binary format and pipe the result to dieharder as follows:

        generate_hash_values | dieharder -a -g 200

Advice to Hash Function Authors:

I'm not competent to give advice about the inner workings of a hash function, but I think I can offer some sound advice about the outer workings. Mainly, make sure your hash function is testable, and this largely boils down to making sure your hash function can generate a large number of values in linear time when it is fed a stream of constants. If each hash value has to be recalculated from scratch, its O(N^2) which is too slow for the large number of values consumed by dieharder.

I think it is instructive to review the changes Austin Appleby applied to MurmurHash 2 (which cannot be tested with dieharder) to produce MurmurHash 2A (which can be tested with dieharder). I would also like to take this opportunity to thank Austin for the e-mail exchange we had. For non-experts like myself, picking a hash function is largely a question of whether you trust the author, and the speed and accuracy with which he produced MurmurHash 2A has informed my decision (which is as close as I can come to endorsing something as nebulous as a hash function). In the spirit of equal time, I'm also enamored of the Jenkins One-at-a-Time hash (and really any hash function that has no dieharder failures). I would pick whichever is faster (or fastest) for your input, or you might want to run the dieharder tests for yourself as there are some minor differences in results.

Hash Function Results:

Jenkins One-at-a-Time Hash has no failures.
MurmurHash 2A has no failures.

Hsieh SuperFastHash has the following failures, but the problem could just be with the guidance given in "Update(5)" for producing an incremental version of this hash. Before the e-mail exchange mentioned above, I applied the same guidance to MurmurHash 2 and did real damage to its hash results. So the point is that the real (non-incremental) version of SuperFastHash could have better results than what is shown below for the incremental version:

          FAILED ... RGB Bit Distribution Test
          FAILED ... RGB Generalized Minimum Distance Test
          FAILED ... RGB Lagged Sum Test
          FAILED ... RGB Permutations Test
          FAILED ... STS Monobit Test
          FAILED ... STS Runs Test
          FAILED ... STS Serial Test (Generalized)
          FAILED ... Diehard Count the 1s (stream) Test
          FAILED ... Diehard Craps Test
          FAILED ... Diehard OPSO
          FAILED ... Diehard OQSO Test
          FAILED ... Diehard Squeeze Test
          FAILED ... Lagged Sum Test
          FAILED ... Marsaglia and Tsang GCD Test

Testing Hash Functions Using Dieharder

General:

Advice to Hash Function Authors:

Hash Function Results:

Source Code: