This page is related to the paper A Word-Based Genetic Algorithm for Cryptanalysis of Short Cryptograms (PDF).
This paper demonstrates the feasibility of a word-based genetic algorithm (GA) for solving short substitution cryptograms such as the kind found in newspapers and puzzle books. Although GA's based on analysis of letter, digram, or trigram frequencies have been used on substitution cryptograms, they are not able to solve short (10-30 word) cryptograms of the sort we address. By using a relatively small dictionary of frequent words to initialize a set of substitution keys, and by employing a word-based crossing mechanism, the GA achieves performance that is comparable to deterministic word-based algorithms.
$ mkdir ga-crypto $ cd ga-crypto $ mv/hcrypto-352.jar . $ jar xvf hcrypto-352.jar
Your directory should look something like this:
$ ls -l drwxr-xr-x 5 ram ram 4096 2011-12-02 13:58 applications drwxr-xr-x 2 ram ram 4096 2004-12-09 14:27 books drwxr-xr-x 3 ram ram 8192 2008-11-14 06:30 cgrams drwxr-xr-x 5 ram ram 4096 2007-11-20 00:38 classes -rw-r--r-- 1 ram ram 4737845 2011-12-02 14:03 hcrypto-352.jar drwxr-xr-x 2 ram ram 4096 2011-12-02 13:34 META-INF drwxr-xr-x 5 ram ram 4096 2009-10-15 14:39 source $
Run the TestGaAnalyzer program, which is in the applications/testanalyzer directory. You should change into that directory. Note how the classpath is set.
When you run the program, it will tell you what command-line arguments it needs:
$ cd applications/testanalyzer $ java -classpath ../../classes/:. TestGaAnalyzer Usage: java TestGaAnalyzer analyzers.GaClassName ParamFile E.G: TestGaAnalyzer analyzers.WordBasedGaAnalyzer params.txt See: classes/analyzers for available GA analyzers. See: params.txt for an example of a parameter file. ram@java:~/public_html/cpsc352/hcrypto/applications/testanalyzer$
The parameter files for the GA are in the paramfiles directory. You want to use the analyzers.WordBasedGaAnalyzer class for your analysis
$ java -classpath ../../classes/:. TestGaAnalyzer analyzers.WordBasedGaAnalyzer paramfiles/params2.txt GaANALYZER: analyzers.WordBasedGaAnalyzer PARAMETERS: file=../../cgrams/hc1.5.03.txt crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=2048 book=./book.txt SEED DICTIONARY Kucera50 EVAL DICTIONARY Kucera3500 SOLUTION: (20) we go aboard our cruising ship anticipating one fine trip before too long we all grow pale and race each other toward the rail NPAIRS = 186 POPULATION SIZE = 2048 USING RANDOM SEEDING STARTING ANALYSIS nTokens 24 nWords 0 nWordsSolution 20 nChars 127 nIndivs 2048 TEXT: yq ju ciucxe ugx nxgrbrpj barf cpvrnrfcvrpj upq drpq vxrf iqduxq vuu kupj yq ckk jxuy fckq cpe xcnq qcna uvaqx vuycxe vaq xcrk ELITIST SELECTION ELITIST MUTATION 1 5.08 0% 0 ERRS 4096 KEYS zywtqxsajponkurmhcvilgfedb mutated=4096 median = 2.02 worst = 1.01 2 5.1 0% 0 ERRS 8192 KEYS pzwvntscyromeqjlkuihgfxdba mutated=3233 median = 2.02 worst = 1.01 ... 200 10.1 4.17% 0 ERRS 819200 KEYS cyhkqnvorwlafpumbsxjtzdgei mutated=2409 median = 5.01 worst = 4.03 Finished: Iterations = 200 Best score is 10.1 KeyCount = 819200 Crosses improved= 0 worsened= 0 nochange= 0 Unused = abdeghilmnoprstvwxz=hlmostwz HERE'S THE TOP 1 RESULTS: Key (0) = cyhkqnvorwlafpumbsxjtzdgei (10.096697708737864) betobeaddmade ------------- DECRYPTED MESSAGE(0): BE TO azoasy oxs fsxiqint qlim, angifimagint ONE WINE gsim, zewose goo dont BE ADD tsob made, ANY SAFE eafl ogles gobasy gle said. SOLUTION(0): WE GO aboard OUR cruising ship, anticipating ONE FINE trip, BEFORE TOO LONG WE ALL GROW pale, AND RACE EACH OTHER TOWARD THE rail. PERCENT WORDS: 4.17 WRONG CHARS: 0 -------------
In the results, the ALL CAPS WORDS are words that are contained in the seed dictionary. As you see, in this case the "GA" did not do very well, but note its crossover and mutation rates.
Edit the params2.txt file trying different parameter settings and observe and discuss the results. Lines that start with # are ignored. You can run 1 experiment at a time. Note that the current experiment (line without #) has a crossrate=0 mutaterate=1. This is like an un-GA -- i.e., random search based on random mutations (swaps) in the DNA. Copy and past that experiment and change the crossrate=0.9 and the mutaterate=0.1 and see the difference in results.
$ more paramfiles/params2.txt #file=../../cgrams/hc1.5.03.txt crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/hc1.5.03.txt crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=1024 book=./book.txt file=../../cgrams/hc1.5.03.txt crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=2048 book=./book.txt #file=../../cgrams/hc11.9.02.txt crossrate=0 mutaterate=1 generations=300 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/hc11.9.02.txt crossrate=0.5 mutaterate=0.2 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/hc11.9.02.txt crossrate=0.5 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/election.txt crossrate=0.9 mutaterate=0.5 generations=50 verbose=true tweaking=false size=2048 book=./book.txt #file=../../cgrams/roses2.txt crossrate=0.9 mutaterate=0.5 generations=75 verbose=true tweaking=false size=2048 book=./book.txt #file=../../cgrams/roses.txt crossrate=0.9 mutaterate=0.5 generations=50 verbose=true tweaking=false size=2048 book=./book.txt #file=../../cgrams/obama2.txt crossrate=0.5 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/obama2.txt crossrate=0.5 mutaterate=0.2 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/obama2.txt crossrate=0.75 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/obama.txt crossrate=0.5 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/obama.txt crossrate=0.5 mutaterate=0.2 generations=200 verbose=true tweaking=false size=512 book=./book.txt #file=../../cgrams/hc6.23.02.txt crossrate=0.9 mutaterate=0.2 generations=50 verbose=true tweaking=false size=1024 book=./book.txt #file=../../cgrams/hc7.21.02.txt crossrate=0.9 mutaterate=0.2 generations=50 verbose=true tweaking=false size=1024 #file=../../cgrams/hc7.28.02.txt crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=512 #file=../../cgrams/hc.confused.txt crossrate=0.9 mutaterate=0.2 generations=50 verbose=true tweaking=false size=1024 #file=../../cgrams/hc.holiday.txt crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=512 #file=../../cgrams/vig.turnips.txt crossrate=0.9 mutaterate=0.5 generations=200 verbose=true tweaking=false size=1024 book=./book.txt