hcrypto.analyzer.tool
Class PatternDictionary

java.lang.Object
  extended by hcrypto.analyzer.tool.Dictionary
      extended by hcrypto.analyzer.tool.PatternDictionary

public class PatternDictionary
extends Dictionary

Implements a searchable dictionary of words and their relative frequencies. It is designed to be used by cryptanalysis objects. It stores words in terms of their patterns. For example, "there" and "these" both have the pattern 12343. So the key for these words would be "12343" and both words would be stored in the Hashtable associated with that key. File format: PatternDictionary expects its source file to contain words and frequencies, one set per line. A good example is the file kucera340.txt, which is a file of the the 340 most frequent words taken from the Kucera-Francis word list, which is available from the MRC Psycholinguistic database: http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm The program assumes that the first line of the file gives the total number of words in in the corpus that was used to compile the relative frequencies. The relative frequencies are integer values. For example, the Kucera-Francis word list is based on a corpus with 1,000,000 words. The format is: TOTAL_WORDS 1000000 THE 69971 OF 36411 ... ... To Test: java -classpath classes hcrypto.analyzer.PatternDictionary kucera340.txt java -classpath classes hcrypto.analyzer.PatternDictionary sourcefile


Field Summary
 
Fields inherited from class hcrypto.analyzer.tool.Dictionary
BIG_DICT, KUCERA_100, KUCERA_340, KUCERA_3500, KUCERA_50, MIN_FREQ
 
Constructor Summary
PatternDictionary()
           
PatternDictionary(int i)
           
PatternDictionary(java.lang.String filename)
          This constructor creates a dictionary from the named file.
 
Method Summary
 boolean contains(java.lang.String word)
           
 boolean containsPattern(java.lang.String pattern)
           
 int countWordsForPattern(java.lang.String pattern)
           
 double getFreq(java.lang.String word)
           
 double getFrequency(java.lang.String word)
           
 java.lang.String[] getPatternWordArray(java.lang.String word)
           
 java.lang.String getWordList(java.lang.String word)
           
static void main(java.lang.String[] args)
           
static java.lang.String makePattern(java.lang.String s)
          This method returns a pattern of the string.
 int nWords()
           
 
Methods inherited from class hcrypto.analyzer.tool.Dictionary
getDescriptor, getDictionaryName, size
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PatternDictionary

public PatternDictionary()

PatternDictionary

public PatternDictionary(int i)

PatternDictionary

public PatternDictionary(java.lang.String filename)
This constructor creates a dictionary from the named file. If linebreaks is true, it assumes words and frequencies are listed one per line with the first line containing the TOTAL_WORDS nnnnn in the corpus.

Parameters:
filename - a String giving the name of the dictionary file
Method Detail

nWords

public int nWords()

getFrequency

public double getFrequency(java.lang.String word)

makePattern

public static java.lang.String makePattern(java.lang.String s)
This method returns a pattern of the string. For example, if the word is "there" the pattern would be 12343. Words longer than 9 letters using UPPERCASE letters. For example, the word "appendectomy" would have the pattern "12234536789A".


getFreq

public double getFreq(java.lang.String word)
Overrides:
getFreq in class Dictionary

contains

public boolean contains(java.lang.String word)
Overrides:
contains in class Dictionary

containsPattern

public boolean containsPattern(java.lang.String pattern)

countWordsForPattern

public int countWordsForPattern(java.lang.String pattern)

getPatternWordArray

public java.lang.String[] getPatternWordArray(java.lang.String word)

getWordList

public java.lang.String getWordList(java.lang.String word)

main

public static void main(java.lang.String[] args)