AThese hieroglyphics have evidently a meaning. If it is a purely arbitrary one, it may be impossible for us to solve it. If, on the other hand, it is systematic, I have no doubt that we shall get to the bottom of it.-- Sherlock Holmes inThe Adventure of the Dancing Men

An *alphabet* is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A,B,C,...,Z}. is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A,B,C,...,Z}. A *simple* substitution is one in which each letter of the plaintext is always replaced by the same ciphertext symbol. In other words, there is a 1-1 relationship between the letters of the plaintext and the ciphertext alphabets.

For the normal English alphabet, how many different ciphertext
alphabets can we get if we use the same letters? In other words, in
how many different ways can we *permute* or rearrange the English
alphabet? The answer is 26!. That's approximately equal to the number
4 followed by 26 zeros. To understand how we got that number imagine
that you are given the task of making an arbitrary permutation of the
English alphabet. You have to make 26 choices. On the first choice you
can choose any one of the 26 letters in the alphabet. On the second
choice you can choose any one of the remaining 25 letters. On the
third choice you can choose any one of the remaining 24 letters. And
so on. On the last choice, there is just one letter remaining. So, in
all there are 26! = 26 x 25 x 24 x ... x 1 different ways to make
these choices.

Although there are 26! possible ciphertext alphabets, any fan of
puzzle books or newspaper cryptograms knows that simple substitution
ciphers are relatively easy to break by hand by analyzing letter
frequencies and guessing at common words. The nine most frequent
letters in English are E,T,N,A,O,R,I,S, and H. The five letters that
occur least often are J, K, Q, X, and Z. Generally, we would need a
letter of considerable length in order to make very good use of our
knowledge of letter frequencies. For example, consider the following
secret message:

In this message the most frequent letter is 'T'. If we assume that T=E, this gives

which isn't very helpful. One problem in this case is the patter E- and the pattern E--E. Since there are relatively few two letter English words beginning with E, this throws our hypothesis that T=E into doubt. Similarly, there aren't many English words that would fit the E--E pattern. Can you think of any?

Another kind of knowledge that we can use to solve this cryptogram is
that the most frequent two letter words in English are:

Since there are so many two letter words in the message that begin and end with K, perhaps a better hypothesis would be that K=O. If we try this substitution, we get

Since the second most frequent letter in English is T, perhaps another useful hypothesis would be that T=T -- i.e., that T stands for itself. That would give us

which is starting to look a bit more promising. Not in this case the T--. The most common three letter word in English that starts with a T is THE. If we make the guess that B=H and L=E, we now get

This is starting to look better. The patter TH-T looks very much like the word THAT. The patter -OT looks very much like the word NOT. If we make the additional guesses that S=A and J=N we get

The last word in the message ends in the patter T-ON, which looks very much like the pattern TION. If we make the guess that C=I, we get

We now have something that looks very much like something Hamlet might say:

As this example shows, even though there are 26! ways to create a simple substitution cryptogram, we can usually crack even very short messages by making judicious use of our knowledge of English, including knowledge of letter and word frequencies, pattern words such as 'the' and 'that', and by making a series of guesses of the form 'the ciphertext letter 'K' is the plaintext letter 'O'. There are simple ways to make simple substitution cryptograms more difficult. One way is to remove the word boundaries. For example, if the above message were written as:

it would be much more difficult to use our knowledge of two and three letter words to solve the cryptogram. The encrypted message is more secure.