Simple Substitution Ciphers

Authors: Chris Savarese and Brian Hart '99

These hieroglyphics have evidently a meaning. If it is a purely arbitrary one, it may be impossible for us to solve it. If, on the other hand, it is systematic, I have no doubt that we shall get to the bottom of it. -- Sherlock Holmes in The Adventure of the Dancing Men
A cipher is a method for encrypting a message -- i.e., for transforming the message into one that can't be easily read. The original message is called the plaintext or clear and the encrypted message is called a cryptogram or ciphertext. A substitution cipheris one in which each letter of the plaintext is replaced by some other symbol. Usually the replacement symbols are themselves letters of the alphabet, but this needn't always be the case as we see in The Dancing Men where the replacement symbols were hieroglyphics.

An alphabet is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A,B,C,...,Z}. is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A,B,C,...,Z}. A simple substitution is one in which each letter of the plaintext is always replaced by the same ciphertext symbol. In other words, there is a 1-1 relationship between the letters of the plaintext and the ciphertext alphabets.

For the normal English alphabet, how many different ciphertext alphabets can we get if we use the same letters? In other words, in how many different ways can we permute or rearrange the English alphabet? The answer is 26!. That's approximately equal to the number 4 followed by 26 zeros. To understand how we got that number imagine that you are given the task of making an arbitrary permutation of the English alphabet. You have to make 26 choices. On the first choice you can choose any one of the 26 letters in the alphabet. On the second choice you can choose any one of the remaining 25 letters. On the third choice you can choose any one of the remaining 24 letters. And so on. On the last choice, there is just one letter remaining. So, in all there are 26! = 26 x 25 x 24 x ... x 1 different ways to make these choices.

Although there are 26! possible ciphertext alphabets, any fan of puzzle books or newspaper cryptograms knows that simple substitution ciphers are relatively easy to break by hand by analyzing letter frequencies and guessing at common words. The nine most frequent letters in English are E,T,N,A,O,R,I,S, and H. The five letters that occur least often are J, K, Q, X, and Z. Generally, we would need a letter of considerable length in order to make very good use of our knowledge of letter frequencies. For example, consider the following secret message:

TK IL KQ JKT TK IL TBST CR TBL OULRTCKJ

In this message the most frequent letter is 'T'. If we assume that T=E, this gives
E- -- -- --E E- -- E--E -- E-- ----E---

which isn't very helpful. One problem in this case is the patter E- and the pattern E--E. Since there are relatively few two letter English words beginning with E, this throws our hypothesis that T=E into doubt. Similarly, there aren't many English words that would fit the E--E pattern. Can you think of any?

Another kind of knowledge that we can use to solve this cryptogram is that the most frequent two letter words in English are:

OF TO IN IS IT BE BY HE AS ON AT OR AN SO IF NO

Since there are so many two letter words in the message that begin and end with K, perhaps a better hypothesis would be that K=O. If we try this substitution, we get
-O -- O- -O- -O -- ---- -- --- ------O-

Since the second most frequent letter in English is T, perhaps another useful hypothesis would be that T=T -- i.e., that T stands for itself. That would give us
TO -- O- -OT TO -- T--T -- T-- ----T-O-

which is starting to look a bit more promising. Not in this case the T--. The most common three letter word in English that starts with a T is THE. If we make the guess that B=H and L=E, we now get
TO -E O- -OT TO -E TH-T -- THE --E-T-O-

This is starting to look better. The patter TH-T looks very much like the word THAT. The patter -OT looks very much like the word NOT. If we make the additional guesses that S=A and J=N we get
TO -E O- NOT TO -E THAT -- THE --E-T-ON

The last word in the message ends in the patter T-ON, which looks very much like the pattern TION. If we make the guess that C=I, we get
TO -E O- NOT TO -E THAT I- THE --E-TION

We now have something that looks very much like something Hamlet might say:
TO BE OR NOT TO BE THAT IS THE QUESTION

As this example shows, even though there are 26! ways to create a simple substitution cryptogram, we can usually crack even very short messages by making judicious use of our knowledge of English, including knowledge of letter and word frequencies, pattern words such as 'the' and 'that', and by making a series of guesses of the form 'the ciphertext letter 'K' is the plaintext letter 'O'. There are simple ways to make simple substitution cryptograms more difficult. One way is to remove the word boundaries. For example, if the above message were written as:
TKILK QJKTT KILTB STCRT BLOUL RTCKJ

it would be much more difficult to use our knowledge of two and three letter words to solve the cryptogram. The encrypted message is more secure.

For Further Study and Enjoyment

  • Cryptogram Tool. Try your hand at deciphering simple substitution cryptograms with the help of this simple Java applet. (Requires a Java-compatible browser.)

  • CryptoToolJ. Try using CryptoToolJ to create and analyze your own simple substitution cryptograms.

  • Sherlock Holmes. One of the best accounts of solving a simple substitution cryptogram is the Sherlock Holmes story The Adventure of the Dancing Men . Sherlock Holmes explains in detail how one solves a simple substitution cryptogram.

  • Edgar Allen Poe. Edgard Allen Poe had an intense interest in cryptography and believed that breaking ciphers and other enigmas only required the straightforward application of reason and logic. According to David Kahn, author of Codebreakers, Poe's story The Gold Bug "remains unequaled as a work of fiction turning upon a secret message." Visit the Poe page on this site.