The Use of Substitution Boxes in Hash Functions
A substitution box (or S-box) is simply a lookup-table. Use of an S-box provides a simple means for confusing the relationship between the input keys and the hash result.
S-boxes are used extensively in cryptographic hashes, and in those applications the values used in the S-box are often chosen very carefully to avoid specific classes of cryptographic attack. The values chosen for the S-box in this simple example were generated randomly.
Figure 1 shows a hash function based on an S-box. Although this function takes more memory than the other functions we've looked at, the implementation of the function is very simple. The S-box contents can fit in the on-chip CPU cache for most modern CPUs. The mixing function can be implemented in just a few CPU instructions.
The 4-bit hash function of Figure 4 on page 4 was an example of an S-box hash. That example used a carefully selected S-box to ensure that the strict avalanche criteria was satisfied exactly. Theoretically you could construct a mixing function that used an S-box with 232 values to produce a SAC-perfect mixing function but that would use an enormous amount of memory (128 GB). So for this function I took a shortcut and just used 256 values, and then I did some simple shifting and adding to make the hash dependent on the positions of the key bytes.
Even though this hash function is very simple, the random nature of the S-box values causes the input bits to be scrambled very effectively. This hash function achieves avalanche for all input bits, and passes the χ2 test for all upper and lower bit combinations for all three types of keys.
S-Boxes in Cryptography
Like most of the hash functions presented in this site, this hash function is intended for hash-table use, not cryptographic use. But randomized S-boxes can play an important role in cryptography.
Often, S-box values are carefully chosen to avoid specific types of cryptographic attack. For example linear cryptanalysis can be used to attacked cryptographic functions that use linear boolean functions (e.g. you would not want the S-box values to be a linear function of the input values). Differential cryptanalysis can be used to attack S-boxes that produce similar values for similar inputs.
S-boxes generated randomly are clearly not optimized to protect against either of these types of attacks, nor would you expect them to provide optimal protection against any specific type of attack. But on the other hand, randomly-generated S-boxes should be relatively immune to a broad spectrum of attacks. It has already been proven that random S-boxes of sufficient size are almost guaranteed to be resistant to both linear and differential cryptanalysis. If the S-box values are random, there is no systematic relationship between values and therefore no systematic weakness.