A scheme for solving Quagmire IV ciphers without a crib.

This scheme works well provided the ciphertext is long enough -- about 300 or more letters, and the period is short enough -- each shifted alphabet contains about 30 or more letters.

A Quagmire IV cipher has three keys: a plaintext key, a codetext key, and an indicator key (see example key array in the next paragraph). Given a Quagmire IV cipher, our solving scheme is to read through a list of words or phrases, searching for the cipher's codetext key. We assign a score to each trial key by a method described below. The scoring method involves constructing a simple substitution cipher whose solution is the plaintext of our Quagmire IV cipher. Hopefully, the trial key with the highest score will turn out to be the actual codetext key, and solving the associated simple substitution cipher will give us the plaintext of the Quagmire IV. In what follows we assume we know the cipher's period. Usually we just take the period with the maximum periodic index of coincidence.

Here's an example key array with CELERY as the plaintext key, CORN as the codetext key (also called the tableau key), and MUSHROOMS as the indicator key.

celryabdfghijkmnopqstuvwxz

MPQSTUVWXYZCORNABDEFGHIJKL

UVWXYZCORNABDEFGHIJKLMPQST

STUVWXYZCORNABDEFGHIJKLMPQ

HIJKLMPQSTUVWXYZCORNABDEFG

RNABDEFGHIJKLMPQSTUVWXYZCO

ORNABDEFGHIJKLMPQSTUVWXYZC

ORNABDEFGHIJKLMPQSTUVWXYZC

MPQSTUVWXYZCORNABDEFGHIJKL

STUVWXYZCORNABDEFGHIJKLMPQ

OK, suppose the computer reads a trial key such as CORN from the list. How do we assign it a score and an associated simple substitution cipher? Here are the steps:

(1) Extend the trial key to a complete keyed alphabet.

(2) Construct a key block by choosing the "highest scoring" shifts of the alphabet against itself. The methods of scoring the shifts are described below.

(3) Use these shifts to produce a simple substitution cipher whose solution may be the plaintext of the Quagmire IV.

(4) Score the simple substitution cipher, and if it is a new maximum, output the trial key, the shifts, and the cipher.

If all goes well, solving the simple substitution cipher with the highest score will yield the plaintext we are looking for.

Let's take a specific Quagmire IV and use it to illustrate the details of the method. Here's the cipher:

MMVVS WUPVU GZHXA JBKFV VHBJR WXQBS IVIRJ XFKFY XOIFK SIVVV RFGRU LCCBV IPDBF TZWUI PHHBJ IHQAY FGAFE WQUDT UEOCS FRAOQ BIUQQ YCDBI TZKPH BKGWN XNVIP UUXOI VVVPU PKSCT RVZXG VLIBY DFOBH ZPDRG PTGRF KZIRA IZVUC SLRWZ BQEMW IRSCP VJAJR BFTPF TQWQE PXQFF NWQWG FXKNT JVVZT TQTKN HRGJC GONSG DSFXR

This cipher has a maximum periodic index of coincidence of 0.068 for a period of 9. We assume that 9 is the correct period.

Suppose we have just read from our list of words and phrases the trial key: CORN. We extend it to a compete keyed alphabet, getting:

CORNABDEFGHIJKLMPQSTUVWXYZ.

How do we get the "highest scoring" shifts of the keyed alphabet against itself? There are several schemes for this. We will start with the one that is fastest for a computer to calculate, although it takes the longest to describe; it is based on the "cross-correlation index". See the article by ANCHISES in the SO2003 Cryptogram for a description of the cross-correlation index and its application to solving Quagmire I ciphers. The first step is to list the frequencies of the ciphertext letters in each of the 9 shifted alphabets, in the order of the keyed alphabet.

Ciphertext Letter Frequencies

C O R N A B D E F G H I J K L M P Q S T U V W X Y Z

1 0 0 0 0 4 0 0 2 4 1 0 0 0 0 1 3 3 1 1 4 0 0 2 0 1 <= alphabet 1

0 0 2 0 1 3 0 1 3 2 1 2 0 3 1 1 0 2 0 0 0 4 1 1 0 0 <= alphabet 2

1 3 0 2 0 0 0 1 4 0 0 1 2 0 1 0 0 0 2 4 1 3 0 0 0 2 <= alphabet 3

4 0 0 2 1 0 0 1 0 0 2 4 0 2 0 1 1 1 0 1 1 2 0 1 1 2 <= alphabet 4

0 0 0 2 0 3 1 0 0 0 1 0 2 1 0 0 0 1 4 1 0 3 5 2 0 1 <= alphabet 5

0 1 4 0 1 0 0 0 1 1 1 3 1 0 1 0 1 5 0 0 1 4 1 0 1 0 <= alphabet 6

0 0 7 0 1 1 2 1 1 1 1 4 1 0 0 0 0 0 0 0 2 3 1 0 1 0 <= alphabet 7

1 0 0 0 2 2 1 0 3 3 0 0 1 0 0 0 7 0 2 0 0 0 2 0 1 2 <= alphabet 8

1 2 1 0 0 0 2 0 3 0 1 1 1 3 0 0 0 0 0 4 2 1 0 4 0 1 <= alphabet 9

We choose one of the 9 alphabets as a "base" and for each of the other 8 we calculate the sums of the products of the letter frequencies of the base and the other alphabet for each of the 26 shifts. For example, taking the top alphabet in the above table as the base, and the one below it as the other alphabet, the sums of the products for each shift are:

Sums of products of frequencies for alphabets 1 and 2 for each of the possible shifts

0 36 8 39 16 38 24 271 32 9 15 17 31 25 28

2 26 10 15 18 29

3 30 11 32 19 34

4 36 12 41 20 27

5 39 13 30 21 26

6 27 14 39 22 32

7 11 15 26 23 38

The maximum cross product is 41, which occurs at a shift of 12.

With alphabet 1 as the base, the shifts of the 8 other alphabets which give the maximum cross-products are:

1 2 3 4 5 6 7 8 9

0 12 3 6 13 12 12 25 3

We could, of course, choose these shifts as our "highest scoring" shifts. But we might as well try all the other alphabets as bases. To compare the resulting lists with each other, we subtract (mod 26) the shift of the first alphabet from the others in its list, making the first shift equal to zero for all the bases. Here's the result:

1 2 3 4 5 6 7 8 9

0 12 3 6 13 12 12 25 3 <= alphabet 1 as base

0 12 18 2 13 12 19 7 8 <= alphabet 2 as base

0 23 3 21 13 12 6 0 23 <= alphabet 3 as base

0 16 14 6 24 23 23 11 14 <= alphabet 4 as base

0 12 3 21 13 12 19 0 14 <= alphabet 5 as base

0 12 3 21 13 12 12 0 14 <= alphabet 6 as base

0 5 9 21 6 12 12 0 3 <= alphabet 7 as base

0 4 2 20 12 11 11 25 2 <= alphabet 8 as base

0 23 9 21 2 1 12 0 3 <= alphabet 9 as base

As our "highest scoring" shifts we choose the shifts that are most common in the above table for each of the alphabets.

The result is:

1 2 3 4 5 6 7 8 9 <= alphabet

0 12 3 21 13 12 12 0 3 <= most common shift.

Aligning the key block according to these shifts we get:

a b c d e f g h i j k l m n o p q r s t u v w x y z

C O R N A B D E F G H I J K L M P Q S T U V W X Y Z (0)

J K L M P Q S T U V W X Y Z C O R N A B D E F G H I (12)

N A B D E F G H I J K L M P Q S T U V W X Y Z C O R (3)

V W X Y Z C O R N A B D E F G H I J K L M P Q S T U (21)

K L M P Q S T U V W X Y Z C O R N A B D E F G H I J (13)

J K L M P Q S T U V W X Y Z C O R N A B D E F G H I (12)

J K L M P Q S T U V W X Y Z C O R N A B D E F G H I (12)

C O R N A B D E F G H I J K L M P Q S T U V W X Y Z (0)

N A B D E F G H I J K L M P Q S T U V W X Y Z C O R (3)

We have put the lower case letters in alphabetical order in a row just above the key block. Now that we have our key block, consisting of the highest scoring shifts, the next step is to "decode" the ciphertext using above array as if it were the Quagmire IV key, getting the string of letters:

pdsafkiqsuxwpksafkijspsaqwurtpqizqmuibfdkpzikszsaiqwjzucxfsjzqdfwqejizqhktjqxfsyfjsfmjfigqu

vyffwqeyrtizefmadfzqeaeyfkjkacqjzqrulyqijjqrqbpfgqjzujjlqsmuiyfywvtqxqqjqfsmzqeizjrffcqwwff

eujzqsxqjjjzqtiqqwqwjfvqurwfijfkjfxbayzjjzqtfqsqyqjjaxyifxusfxq

If our trial key CORN is the actual codetext key and if our shifts are the correct ones then this string is a simple substitution cipher whose solution is the plaintext of the Quagmire IV.

A couple of observations at this point: (1) The procedure for getting the highest scoring shifts may seem complicated, but it can be carried out very quickly on a computer. Also we always get the same shifts no matter how many times we repeat the procedure -- there is no element of chance or random choice in getting these shifts. (2) We don't have time to try solving the trial simple substitution ciphers. The trial key CORN may be one of thousands or even millions of trial keys on our list. Trying to solve the resulting simple substitution ciphers by hill-climbing would take much too long. We need a quick way of scoring a trial simple substitution cipher.

After some experimenting, the best way of scoring simple ciphers that I've found is to use the weighted sum of the index of coincidence and the digraphic index of coincidence. For weights multiply the index of coincidence by 1000 and the digraphic index of coincidence by 10000. For a definition of the index of coincidence and the digraphic index of coincidence, see the notes at the end of __the Statistical Reference Guide to the ACA ciphers__ elsewhere on this web site.

For our example trial cipher, repeating it here in upper case:

PDSAFKIQSUXWPKSAFKIJSPSAQWURTPQIZQMUIBFDKPZIKSZSAIQWJZUCXFSJZQDFWQEJIZQHKTJQXFSYFJSFMJFIGQU

VYFFWQEYRTIZEFMADFZQEAEYFKJKACQJZQRULYQIJJQRQBPFGQJZUJJLQSMUIYFYWVTQXQQJQFSMZQEIZJRFFCQWWFF

EUJZQSXQJJJZQTIQQWQWJFVQURWFIJFKJFXBAYZJJZQTFQSQYQJJAXYIFXUSFXQ

we have: 1000 times index of coincidence = 69, and 10000 times digraphic index of coincidence = 64. So the IC+DIC score is 133. (Actually it's 134 if you take the sum before rounding off).

Using IC+DIC to score the trial simple ciphers, my 2.7 Ghz celeron processed a list of 61000 trial keys in about 1 minute. The highest scoring key, at 134, was indeed, CORN. At this rate, processing a list with even a million words or phrases takes only about 15 minutes.

This method seems reliable if the ciphertext is long enough (300 or more letters) so that the simple cipher with the highest IC+DIC score will be the correct one, and the period is short enough (30 or more letters per shifted alphabet) so that the cross-correlation index gets the correct shifts.

Our example does not quite meet these conditions, with a length of 245, and with 27-28 letters per shifted alphabet. But the highest scoring key, CORN, is indeed the correct one. However if you solve the simple substitution cipher, you will not get the precise plaintext, but it will be clear what the correct plaintext should be. Also, if you examine the columns in the trial key block you will see MOSHROOMS. It's not much of a stretch to see that the shift for alphabet 2 is wrong and that the indicator key is MUSHROOMS.

Sometimes the cipher is too short, the period is too big, or something else is wrong, and the method described above doesn't work -- the highest scoring trial key is not the actual codetext key. To add a second pass at a solution, you can try some alternate methods of finding the best key block shifts. Unfortunately, these methods are too slow to go through a list of millions of words and phrases. To get around the speed problem, during the first pass at a solution, I save not just the top scoring trial key but the 5000 highest scoring keys. Working on this filtered list of 5000 words and phrases you don't need scoring methods that are so fast.

One alternate method of finding the correct shifts for a trial key is to use hill-climbing in an attempt to maximize the IC+DIC score of the corresponding simple substitution cipher. We determine the original shifts as in the first pass, but then choose an alternate shift for a random alphabet and if it improves the IC+DIC score we keep it, and if it doesn't improve the score, we restore the original shift, and so on for the number of hill-climbing steps we've decided on. I've tried two methods of choosing alternate shifts. In the first we restrict our choice of shifts to those that were best for at least one of the base alphabets. If restrict our choices in this way, we don't need too many hill-climbing steps -- say 100 per trial key. At that rate we can try the 5000 keys in our list in about the same amount of time that it took to go through the original list of 60000 during the first pass-- about one minute.

In our example with the trial key CORN, the choice of shifts for the second alphabet is: 12,23,16,5,4. After 100 hill-climbing steps, the best scoring shifts were:

1 2 3 4 5 6 7 8 9 <= alphabet

0 5 3 21 13 12 12 0 3 <= shift

and the corresponding simple substitution cipher was:

PKSAFKIQSUEWPKSAFKIQSPSAQWURAPQIZQMUIIFDKPZIKSGSAIQWJZUJXFSJZQDFDQEJIZQHKAJQXFSYFJZFMJFIGQU

CYFFWQEYRAIZEFMADFGQEAEYFKJRACQJZQRUSYQIJJQRQIPFGQJZUJQLQSMUIYFFWVTQXQQJXFSMZQEIZQRFFCQWWFM

EUJZQSXQQJJZQTIQQDQWJFVQURDFIJFKJFXIAYZJJZQTMQSQYQJJAEYIFXUSFXX

This cipher has an IC+DIC score of 142. This was the highest scoring simple cipher among the 5000 candidates, and in fact, solving it gives the correct Quagmire IV plaintext.

Sometimes restricting our shift choices to those than were best for some base alphabet doesn't work, and we have to make a third pass in which we simply choose alternate shifts at random. In this case we need more hill-climbing steps so this method is slow. Also during the third pass, following a suggestion of Scryer, I've added bonus points to the IC+DIC score if the key block contains a column that is a known word or phrase. Searching through all the columns for words or phrases also slows things down, of course. With 500 hill-climbing steps per trial key, the third pass takes about 2 minutes to go through the list of 5000.

I've now tested this method on several hundred computer generated ciphers, as well as on all the Quagmire IV's without cribs that I could find in the Cryptogram. The vast majority of the ciphers were decided at stage 1 -- either the top scoring key was correct or else it didn't make the top 5000. In about 10 percent of the ciphers the solution was found on the second or third pass.