CPSC 352 -- Artificial Intelligence
Notes: AI Algorithms & Approaches: Word-based Ga for Short Cryptograms

Introduction

This page is related to the paper A Word-Based Genetic Algorithm for Cryptanalysis of Short Cryptograms (PDF).

This paper demonstrates the feasibility of a word-based genetic algorithm (GA) for solving short substitution cryptograms such as the kind found in newspapers and puzzle books. Although GA's based on analysis of letter, digram, or trigram frequencies have been used on substitution cryptograms, they are not able to solve short (10-30 word) cryptograms of the sort we address. By using a relatively small dictionary of frequent words to initialize a set of substitution keys, and by employing a word-based crossing mechanism, the GA achieves performance that is comparable to deterministic word-based algorithms.

Hands-on Demo/Exercises

  • Download and unjar the hcrypto-352.jar file using the commands:
    $ mkdir ga-crypto
    $ cd ga-crypto
    $ mv /hcrypto-352.jar .
    $ jar xvf hcrypto-352.jar
    

    Your directory should look something like this:

    $ ls -l
    drwxr-xr-x 5 ram ram     4096 2011-12-02 13:58 applications
    drwxr-xr-x 2 ram ram     4096 2004-12-09 14:27 books
    drwxr-xr-x 3 ram ram     8192 2008-11-14 06:30 cgrams
    drwxr-xr-x 5 ram ram     4096 2007-11-20 00:38 classes
    -rw-r--r-- 1 ram ram  4737845 2011-12-02 14:03 hcrypto-352.jar
    drwxr-xr-x 2 ram ram     4096 2011-12-02 13:34 META-INF
    drwxr-xr-x 5 ram ram     4096 2009-10-15 14:39 source
    $
    

    Running the WordBasedGaAnalyzer

    Run the TestGaAnalyzer program, which is in the applications/testanalyzer directory. You should change into that directory. Note how the classpath is set.

    When you run the program, it will tell you what command-line arguments it needs:

    $ cd applications/testanalyzer
    $ java -classpath ../../classes/:. TestGaAnalyzer
    Usage: java TestGaAnalyzer analyzers.GaClassName ParamFile
    E.G: TestGaAnalyzer analyzers.WordBasedGaAnalyzer params.txt
    See: classes/analyzers for available GA analyzers.
    See: params.txt for an example of a parameter file.
    ram@java:~/public_html/cpsc352/hcrypto/applications/testanalyzer$ 
    

    The parameter files for the GA are in the paramfiles directory. You want to use the analyzers.WordBasedGaAnalyzer class for your analysis

    $ java -classpath ../../classes/:. TestGaAnalyzer analyzers.WordBasedGaAnalyzer paramfiles/params2.txt
    GaANALYZER: analyzers.WordBasedGaAnalyzer
    PARAMETERS: file=../../cgrams/hc1.5.03.txt     crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=2048 book=./book.txt
    SEED DICTIONARY
    Kucera50
    EVAL DICTIONARY
    Kucera3500
    SOLUTION: (20) 
    we go aboard our cruising ship
    anticipating one fine trip
    before too long we all grow pale
    and race each other toward the rail
    
    NPAIRS = 186 POPULATION SIZE = 2048
    USING RANDOM SEEDING
    STARTING ANALYSIS nTokens 24 nWords 0 nWordsSolution 20 nChars 127 nIndivs 2048
    TEXT: yq ju ciucxe ugx nxgrbrpj barf
    cpvrnrfcvrpj upq drpq vxrf
    iqduxq vuu kupj yq ckk jxuy fckq
    cpe xcnq qcna uvaqx vuycxe vaq xcrk
    
    
    ELITIST SELECTION ELITIST MUTATION 
    1 5.08 0% 0 ERRS 4096 KEYS zywtqxsajponkurmhcvilgfedb mutated=4096 median = 2.02 worst = 1.01
    2 5.1 0% 0 ERRS 8192 KEYS pzwvntscyromeqjlkuihgfxdba mutated=3233 median = 2.02 worst = 1.01
    ...
    200 10.1 4.17% 0 ERRS 819200 KEYS cyhkqnvorwlafpumbsxjtzdgei mutated=2409 median = 5.01 worst = 4.03
    Finished: Iterations = 200 Best score is 10.1 KeyCount = 819200
    Crosses improved= 0 worsened= 0 nochange= 0
    Unused = abdeghilmnoprstvwxz=hlmostwz
    HERE'S THE TOP 1  RESULTS:
    
    Key (0) = cyhkqnvorwlafpumbsxjtzdgei (10.096697708737864) betobeaddmade
    -------------
    DECRYPTED MESSAGE(0): 
    BE TO azoasy oxs fsxiqint qlim, angifimagint ONE WINE gsim, zewose goo dont BE ADD tsob made, ANY SAFE eafl ogles gobasy gle said. 
    SOLUTION(0):
    WE GO aboard OUR cruising ship, anticipating ONE FINE trip, BEFORE TOO LONG WE ALL GROW pale, AND RACE EACH OTHER TOWARD THE rail. 
    PERCENT WORDS: 4.17
    WRONG CHARS: 0
    -------------
    

    In the results, the ALL CAPS WORDS are words that are contained in the seed dictionary. As you see, in this case the "GA" did not do very well, but note its crossover and mutation rates.

    Edit the params2.txt file trying different parameter settings and observe and discuss the results. Lines that start with # are ignored. You can run 1 experiment at a time. Note that the current experiment (line without #) has a crossrate=0 mutaterate=1. This is like an un-GA -- i.e., random search based on random mutations (swaps) in the DNA. Copy and past that experiment and change the crossrate=0.9 and the mutaterate=0.1 and see the difference in results.

    $ more paramfiles/params2.txt
    #file=../../cgrams/hc1.5.03.txt     crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/hc1.5.03.txt     crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=1024 book=./book.txt
    file=../../cgrams/hc1.5.03.txt     crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=2048 book=./book.txt
    #file=../../cgrams/hc11.9.02.txt     crossrate=0 mutaterate=1 generations=300 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/hc11.9.02.txt     crossrate=0.5 mutaterate=0.2 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/hc11.9.02.txt     crossrate=0.5 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/election.txt     crossrate=0.9 mutaterate=0.5 generations=50 verbose=true tweaking=false size=2048 book=./book.txt
    #file=../../cgrams/roses2.txt     crossrate=0.9 mutaterate=0.5 generations=75 verbose=true tweaking=false size=2048 book=./book.txt
    #file=../../cgrams/roses.txt     crossrate=0.9 mutaterate=0.5 generations=50 verbose=true tweaking=false size=2048 book=./book.txt
    #file=../../cgrams/obama2.txt     crossrate=0.5 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/obama2.txt     crossrate=0.5 mutaterate=0.2 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/obama2.txt     crossrate=0.75 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/obama.txt     crossrate=0.5 mutaterate=0.5 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/obama.txt     crossrate=0.5 mutaterate=0.2 generations=200 verbose=true tweaking=false size=512 book=./book.txt
    #file=../../cgrams/hc6.23.02.txt     crossrate=0.9 mutaterate=0.2 generations=50 verbose=true tweaking=false size=1024 book=./book.txt
    #file=../../cgrams/hc7.21.02.txt     crossrate=0.9 mutaterate=0.2 generations=50 verbose=true tweaking=false size=1024
    #file=../../cgrams/hc7.28.02.txt     crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=512
    #file=../../cgrams/hc.confused.txt     crossrate=0.9 mutaterate=0.2 generations=50 verbose=true tweaking=false size=1024
    #file=../../cgrams/hc.holiday.txt     crossrate=0 mutaterate=1 generations=200 verbose=true tweaking=false size=512
    #file=../../cgrams/vig.turnips.txt     crossrate=0.9 mutaterate=0.5 generations=200 verbose=true tweaking=false size=1024 book=./book.txt