Testing Hash Functions Using Dieharder


A while back one of my patches was accepted into dieharder that makes it easier to test hash functions. Just write a program that feeds incrementally larger blocks of a constant value to your hash function. Your program should write each hash value to stdout in binary format and pipe the result to dieharder as follows:
        generate_hash_values | dieharder -a -g 200

Advice to Hash Function Authors:

I'm not competent to give advice about the inner workings of a hash function, but I think I can offer some sound advice about the outer workings. Mainly, make sure your hash function is testable, and this largely boils down to making sure your hash function can generate a large number of values in linear time when it is fed a stream of constants. If each hash value has to be recalculated from scratch, its O(N^2) which is too slow for the large number of values consumed by dieharder.

I think it is instructive to review the changes Austin Appleby applied to MurmurHash 2 (which cannot be tested with dieharder) to produce MurmurHash 2A (which can be tested with dieharder). I would also like to take this opportunity to thank Austin for the e-mail exchange we had. For non-experts like myself, picking a hash function is largely a question of whether you trust the author, and the speed and accuracy with which he produced MurmurHash 2A has informed my decision (which is as close as I can come to endorsing something as nebulous as a hash function). In the spirit of equal time, I'm also enamored of the Jenkins One-at-a-Time hash (and really any hash function that has no dieharder failures). I would pick whichever is faster (or fastest) for your input, or you might want to run the dieharder tests for yourself as there are some minor differences in results.

Hash Function Results:

Source Code: