﻿ Pluto Scarab — Hash Function Structure

# Hash Functions (continued)

## Simple Hash Function

Let's look at a simple hash function. Figure 1 shows a hash function intended to be used for a hash table where the keys are character strings.

The hash value is initialized to zero at the start of the function, then for each character in the string the hash value is multiplied by 33 and the Unicode code point value is added to the hash value. No overflow checking is specified, so in C# the value will just wrap around within the range of System.UInt32 if it gets big enough.

The multiplication by 33 is called the mixing step and the function f : `uint``uint` defined by f(x) = x * 33 where * is the C# multiply-without-overflow operator is called the mixing function. The constant 33 is chosen arbitrarily.

The addition of the Unicode code point value of the character is called the combining step and the function g : `uint` × `uint``uint` defined by g(x, y) = x + y where + is the C# add-without-overflow operator is called the combining function.

Note that this particular hash function has dubious uniform distribution properties. We'll evaluate that more later.

## Hash Function Structure

The preceding hash function was too simple to illustrate all of the structures commonly found in hash functions. Figure 2 shows a more complete example.

1. First is the initialization step, where the internal state of the hash function is readied.
2. Next, the message is split up into zero or more blocks of equal size, and the internal state of the hash function is updated for each block. In this example, the message is divided into 32-bit blocks. For each block, the combining function is applied to the prior state and the bits from the current block, and then the mixing function is called. This step is repeated for each full-sized block in the message.
3. Then the final, partial block is processed, if necessary. The combining step is modified to accomodate the incomplete block.
4. Finally, some final processing is done to further randomize the internal state. In this case, the mixing step is applied two more times.

If you examine the hash function from figure 1 again, you'll see that it almost follows this structure. The initialization step is there (the hash value is set to zero), and the message is processed in blocks of 16-bits. There is no post-processing step, or rather there is an "empty" post-processing step.

The main difference is that in figure 1 the mixing step is performed before the combining step for each block. And since there is no post-processing step, it's clear that the bits from the last message block are not processed by the mixing function and therefore do not affect the upper 16 bits of the hash value. If you were to use this hash function for hash table lookup you would want to be sure to use the lower-order bits of the hash value instead of the upper-order bits.

Later we'll examine the mysterious constants in this program, in the initialization step and in the mixing function, to determine whether this is a "good" hash function.