Evaluating Hash Functions
In the preceding section we looked at the avalanche behavior of mixing functions. Now we're ready to look at the behavior of full hash functions including initialization, combining, mixing, and post-processing steps. We'll evaluate the avalanche, distribution, and correlation performance of several existing string hash functions, including a hash with 96-bit internal state by Bob Jenkins, the Fowler/Noll/Vo (FNV) 32-bit hash, a simplistic hash for a baseline comparison, and a cryptographic hash.
First we'll describe the tests and provide results for our baseline hash function (defined below), and in subsequent sections we'll evaluate each hash function individually. Finally, we'll summarize the results for all of our test candidates.
What to Look For in a Hash Function
Before we do some testing, I want to point out that the true test involves testing the hash function in situ. Different applications have different hash requirements, and what works well in one situation may not work well in another. What we're going to look at is the generic behavior of hash functions, assuming we know nothing about the keys that are going to be hashed. This should tell us something about the worst-case performance of the functions.
Since we assume no knowledge about the hash keys, we'll use completely random keys. Keys will be arrays of octets. We'll use keys that consist of uniformly distributed octets, keys that simulate text, and keys that are very bit-sparse (a single "1" bit per octet). Why not just use random keys with uniformly distributed octets? Because a hash function isn't truly needed in that scenario. If you know ahead of time that your keys are composed of completely random bits, you can just use a subsection of the key as your hash value and be guaranteed perfect distribution properties. Adding "text" keys and sparse keys will illustrate a broader range of real-world requirements.
Good general purpose hash functions should have these properties:
- The output should be uniformly distributed. Each pattern of output bits should be equally likely. We'll use a one-tailed χ2 bucket test for this. See this site by Amar Patel for a very clear explanation of the χ2 test, or this page for a more formal explanation.
- Every input bit should affect every output bit about 50% of the time if you consider all possible combinations of the other input bits. This is the avalanche condition. Conversely, every output bit should be affected by every input bit. This is similar to the "no-funnels" condition described by Bob Jenkins. We'll need to modify our avalanche evaluator to handle variable-sized input bit vectors.
- There should be no correlations between pairs of output bits. If output bits are correlated, you're not getting a full n bits of output. We won't test for this because it is easy for all but the most unsuitable hash functions to avoid these correlations.
- It should be computationally infeasible to identify two keys that produce the same hash value or to identify a key that produces a given hash value. This is a requirement for cryptographic hashes, which are much longer than 32-bits. Since it's easy to find collisions for even an ideal 32-bit hash, this isn't a requirement that we'll use.
We first need to create a test harness for hash functions just like we did
for mixing functions. Figure 1 shows the
base class which we'll use for all the hashes we'll investigate. We're going
to assume a 32-bit hash output for all the hash functions.
Different types of keys require different lengths. In order for the χ2 tests with 216 buckets to be valid, all of the keys need to contain at least 16 bits of information. The sparse keys only contain three bits of information in each octet, so we need at least six octets to ensure 16 bits of information. The "text" keys contain 4.34 bits per octet on average, so we need at least four octets for that one. We only need two octets for the uniformly distributed octets.
Figure 2 lists a rudimentary hash function which we'll use to work out our
testing methodology. The constant
0x50003 was selected for the
ability for the "3" to mix key bits into the lower half of the hash value
and the "5" to mix key bits into the upper half. We avoid using the same number
for both halves to minimize correlations between each set of bits.
Testing Uniform Distribution
Hash functions used with hash tables of size 2n would typically use only the lower-order bits of a 32-bit hash function output. But to be thorough in our tests we'll look at the distribution properties of both the low-order and high-order bits, up to 16 bits long.
For each test we'll call the hash function repeatedly with random keys, and look to see which bucket we're assigned. We'll count the hits to each bucket, and compare the result to the expected result for a truly uniform distribution. We'll use the one-tailed χ2 test for this. (We don't care if the distribution is "too uniform" since we're throwing random keys at each function. If we were testing a random number generator we might want a two-tailed test.) The χ2 degrees of freedom for each measurement will be 2m-1, where m is the bit-size of the sub-range that we're testing.
The result of each χ2 test will be a p-value, which is a number from 0.0 to 1.0, and we would expect these to be uniformly distributed in that range if the hash output is uniform. The p-value indicates the probability that a truly uniform distribution would be worse that the observed distribution by chance alone. For example a p-value of 0.01 indicates that a truly uniform distribution would only be worse than the observed distribution only 1% of the time, which is bad.
Even if we define p=0.01 as the cut-off for a "bad" distribution, a truly uniform distribution would have a p-value worse than this 1% of the time, by definition. So we need to look at each failure in context to see if it's likely just random chance or whether it forms part of a pattern. In general I will re-run a test several times to determine which results are significant and which are not, and I only present the most representative results. I wouldn't do this in a scientific context, but this isn't that.
For each test, we'll generate enough random keys to fill each bucket with
an average of 100 keys. The key length will be a random variable
k+floor(sqrt(-800*ln(x))) where x is a uniformly distributed
random variable on (0, 1] and k is the minimum key length required to
produce 16 bits of data in the key, which varies by the type of key.
floor indicates rounding down,
indicates square-root, and
ln is the natural logarithm. We'll need
a minimum key length of at least k because we'll be doing χ2 tests
with up to 216 buckets, and the presence of some too-short keys would
force non-uniform output in those cases and cause those tests to fail regardless
of the hash function used.
Figure 3 shows the results for the
SimpleHash function. It shows
weakness in both the upper and lower bits when use 14 or more of the bits.
The upper bits are a little stronger--this half of the hash gets some "overflow"
from multiplications that carry out of the lower half.
If a hash function passes the uniform distribution test, why do any of the other tests matter? While it's true that the primary use for most hashes is for hash table lookup, that's not the only use. Small hashes can be used for non-cryptographic document fingerprinting, for example. Also, other tests can identify classes of keys for which the uniform distribution might fail even if the tests with random keys succeed.
Unlike the previous section which pursued the Strict Avalanche Condition to paranoid levels, we'll consider "good enough" avalanche to be when each input bit affects each output bit between 1/3rd and 2/3rds of the time. The mixing function will be applied repeatedly in most cases, which will only improve the avalanche behavior if it's good enough to start with. If single-round avalanche is borderline, we might consider calling the mixing function one or two more times as part of the post-processing step. This test will tell is which functions would benefit from that.
The definition of avalanche requires that we only evaluate keys with uniformly distributed input octets, so this test won't use GetTextKeys or GetSparseKeys. Also, it's challenging to test every input bit when keys get very long, so we're going to restrict keys to specific lengths. We'll examine two-octet keys which will allow us to exactly calculate the avalanche matrix, as well as four-octet keys for which we'll need to do sampling just as before. Finally we'll use 256-octet keys to see the avalanche behavior when multiple rounds of the mixing function are used, and we'll look at input bits in the first and last octets.
The 16x32 and 32x32 avalanche matrices are cumbersome to present on a web page, so we'll use a graphical summary of the results instead. We'll use green squares for bit combinations that achieved avalanche, orange squares that did not, and red squares where the input bits had either no affect or a direct affect on the output (0% or 100% affect).
Figure 4 shows the results of the avalanche tests for the
function. Next we'll look at some other well-known hash functions.