One of the main problems with simple
substitution ciphers is that they are so vulnerable to *frequency analysis*. Given a sufficiently
large ciphertext, it can easily be broken by mapping the frequency of
its letters to the know frequencies of, say, English text.
Therefore, to make ciphers more secure, cryptographers have long been
interested in developing enciphering techniques that are immune to
frequency analysis. One of the most common approaches is to suppress
the normal frequency data by using more than one alphabet to encrypt
the message. A *polyalphabetic substitution
cipher* involves the use of two or more cipher
alphabets. Instead of there being a one-to-one relationship between
each letter and its substitute, there is a one-to-many relationship
between each letter and its substitutes.

Note that each row of the table corresponds to a Caesar Cipher. The first row is a shift of 0; the second is a shift of 1; and the last is a shift of 25.A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z B B C D E F G H I J K L M N O P Q R S T U V W X Y Z A C C D E F G H I J K L M N O P Q R S T U V W X Y Z A B D D E F G H I J K L M N O P Q R S T U V W X Y Z A B C E E F G H I J K L M N O P Q R S T U V W X Y Z A B C D F F G H I J K L M N O P Q R S T U V W X Y Z A B C D E G G H I J K L M N O P Q R S T U V W X Y Z A B C D E F H H I J K L M N O P Q R S T U V W X Y Z A B C D E F G I I J K L M N O P Q R S T U V W X Y Z A B C D E F G H J J K L M N O P Q R S T U V W X Y Z A B C D E F G H I K K L M N O P Q R S T U V W X Y Z A B C D E F G H I J L L M N O P Q R S T U V W X Y Z A B C D E F G H I J K M M N O P Q R S T U V W X Y Z A B C D E F G H I J K L N N O P Q R S T U V W X Y Z A B C D E F G H I J K L M O O P Q R S T U V W X Y Z A B C D E F G H I J K L M N P P Q R S T U V W X Y Z A B C D E F G H I J K L M N O Q Q R S T U V W X Y Z A B C D E F G H I J K L M N O P R R S T U V W X Y Z A B C D E F G H I J K L M N O P Q S S T U V W X Y Z A B C D E F G H I J K L M N O P Q R T T U V W X Y Z A B C D E F G H I J K L M N O P Q R S U U V W X Y Z A B C D E F G H I J K L M N O P Q R S T V V W X Y Z A B C D E F G H I J K L M N O P Q R S T U W W X Y Z A B C D E F G H I J K L M N O P Q R S T U V X X Y Z A B C D E F G H I J K L M N O P Q R S T U V W Y Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Z Z A B C D E F G H I J K L M N O P Q R S T U V W X Y

The Vigenere cipher uses this table together with a keyword to encipher a message. For example, suppose we wish to encipher the plaintext message:

using the keyword RELATIONS. We begin by writing the keyword, repeated as many times as necessary, above the plaintext message. To derive the ciphertext using the tableau, for each letter in the plaintext, one finds the intersection of the row given by the corresponding keyword letter and the column given by the plaintext letter itself to pick out the ciphertext letter.

Keyword: RELAT IONSR ELATI ONSRE LATIO NSREL Plaintext: TOBEO RNOTT OBETH ATIST HEQUE STION Ciphertext: KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY

Keyword: RELAT IONSR ELATI ONSRE LATIO NSREL Ciphertext: KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY Plaintext: TOBEO RNOTT OBETH ATIST HEQUE STION

The strength of the Vigenere cipher against frequency analysis can be seen by examining the above ciphertext. Note that there are 7 'T's in the plaintext message and that they have been encrypted by 'H,' 'L,' 'K,' 'M,' 'G,' 'X,' and 'L' respectively. This successfully masks the frequency characteristics of the English 'T.' One way of looking at this is to notice that each letter of our keyword RELATIONS picks out 1 of the 26 possible substitution alphabets given in the Vigenere tableau. Thus, any message encrypted by a Vigenere cipher is a collection of as many simple substitution ciphers as there are letters in the keyword.

Although the Vigenere cipher has all the features of a useful field cipher -- i.e., easily transportable key and tableau, requires no special apparatus, easy to apply, etc. -- it did not catch on its day. A variation of it, known as the Gronsfeld cipher , did catch on in Germany and was widely used in Central Europe. The Gronsfeld variant used the digits of a keynumber instead of a the letters of keyword, but remained unchanged in all other respects. So in fact the Gronsfeld is a weaker technique than Vigenere since it only uses 10 substitute alphabets (one per digit 0..9) instead of the 26 used by Vigenere.

Vigenere-like substitution ciphers were regarded by many as practically unbreakable for 300 years. In 1863, a Prussian major named Kasiski proposed a method for breaking a Vigenere cipher that consisted of finding the length of the keyword and then dividing the message into that many simple substitution cryptograms. Frequency analysis could then be used to solve the resulting simple substitutions.

Kasiski's technique for finding the length of the keyword was based on measuring the distance between repeated bigrams in the ciphertext. Note that in the above cryptogram the plaintext bigram 'TO' occurs twice in the message at position 0 and 9 and in both cases it lines up perfectly with the first two letters of the keyword. Because of this it produces the same ciphertext bigram, 'KS.' The same can be said of plaintext 'BE' which occurs twice starting at positions 2 and 11, and also is encrypted with the same ciphertext bigram, 'ME.' In fact, any message encrypted with a Vigenere cipher will produce many such repeated bigrams. Although not every repeated bigram will be the result of the encryption of the same plaintext bigram, many will, and this provides the basis for breaking the cipher. By measuring and factoring the distances between recurring bigrams -- in this case the distance is 9 -- Kasiski was able to guess the length of the keyword. For this example,

Location: 01234 56789 01234 56789 01234 56789 Keyword: RELAT IONSR ELATI ONSRE LATIO NSREL Plaintext: TOBEO RNOTT OBETH ATIST HEQUE STION Ciphertext: KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY

the Kasiski method would create something like the following list:

Repeated Bigram | Location | Distance | Factors |

KS | 9 | 9 | 3, 9 |

SM | 10 | 9 | 3, 9 |

ME | 11 | 9 | 3, 9 |

... |

Factoring the distances between repeated bigrams is a way of identifying possible keyword lengths. Those factors that occur most frequently will be the best candidates for the length of the keyword. Note that in this example since 3 is also a factor of 9 (and any of its multiples) both 3 and 9 would be reasonable candidates for keyword length. Although in this example we don't have a clear favorite, we've narrowed down the possibilities to a very small list. Note also that if a longer ciphertext were encrypted with the same keyword ('RELATIONS'), we would expect to find repeated bigrams at multiples of 9 -- i.e., 18, 27, 81, etc. These would also have 3 as a factor. Kasiski's important contribution is to note this phenomenon of repeated bigrams and propose a method -- factoring of distances -- to analyze it.

Once the length of the keyword is known, the ciphertext can be broken up into that many simple substitution cryptograms. That is, for a keyword of length 9, every 9-th letter in the ciphertext was encrypted with the same keyword letter. Given the structure of the Vigenere tableau, this is equivalent to using 9 distinct simple substitution ciphers, each of which was derived from 1 of the 26 possible Caesar shifts given in the tableau. The pure Kasiski method proceeds by analyzing these simple substitution cryptograms using frequency analysis and the other standard techniques.

A variant of this method, proposed by the French cryptographer Kerckhoff, is based on discovering the keyword itself and then using it to decipher the cryptogram. In Kerckhoff's method, after the message has been separated into several columns, corresponding to the simple substitution cryptograms, one tallies the frequencies in each column and then uses frequency and logical analysis to construct the key. For example, suppose the most frequent letter in the first column is 'K'. We would hypothesize that 'K' corresponds to the English 'E'. If we consult the Vigenere tableau at this point, we can see that if English 'E' were enciphered into 'K' then row G of the table must have been the alphabet used for the first letter of the keyword. This implies that the first letter of the keyword is 'G'.

The problem with this "manual" approach is that for short messages there are often several good candidates for English 'E' in each column. This requires the testing of multiple hypotheses, which can get quite tedious and involved. Therefore we need a more sensitive test to discover the alphabet used by each letter of the keyword.

Recalling that each row of the Vigenere tableau is one of the 26 Caesar shifts, we can use the chi-square test to determine which of the 26 possible shifts was used for each letter of the keyword. This modern day version of the Kerckhoff method turns out to be very effective. And this is the algorithm that is used in CryptoToolJ's Vigenere Analyzer.