hcrypto.analyzer.tool
Class PatternDictionary
java.lang.Object
hcrypto.analyzer.tool.Dictionary
hcrypto.analyzer.tool.PatternDictionary
public class PatternDictionary
- extends Dictionary
Implements a searchable dictionary of words and their
relative frequencies. It is designed to be used
by cryptanalysis objects. It stores words in terms of their
patterns. For example, "there" and "these" both have the pattern
12343. So the key for these words would be "12343" and both words
would be stored in the Hashtable associated with that key.
File format: PatternDictionary expects its source file to contain
words and frequencies, one set per line. A good example
is the file kucera340.txt, which is a file of the the 340 most
frequent words taken from the Kucera-Francis word list, which is
available from the MRC Psycholinguistic database:
http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm
The program assumes that the first line of the file gives the total
number of words in in the corpus that was used to compile the relative
frequencies. The relative frequencies are integer values. For example,
the Kucera-Francis word list is based on a corpus with 1,000,000 words.
The format is:
TOTAL_WORDS 1000000
THE 69971
OF 36411
... ...
To Test:
java -classpath classes hcrypto.analyzer.PatternDictionary kucera340.txt
java -classpath classes hcrypto.analyzer.PatternDictionary sourcefile
| Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PatternDictionary
public PatternDictionary()
PatternDictionary
public PatternDictionary(int i)
PatternDictionary
public PatternDictionary(java.lang.String filename)
- This constructor creates a dictionary from the named file.
If linebreaks is true, it assumes words and frequencies are listed one
per line with the first line containing the TOTAL_WORDS nnnnn in the
corpus.
- Parameters:
filename - a String giving the name of the dictionary file
nWords
public int nWords()
getFrequency
public double getFrequency(java.lang.String word)
makePattern
public static java.lang.String makePattern(java.lang.String s)
- This method returns a pattern of the string. For example,
if the word is "there" the pattern would be 12343. Words longer
than 9 letters using UPPERCASE letters. For example, the
word "appendectomy" would have the pattern "12234536789A".
getFreq
public double getFreq(java.lang.String word)
- Overrides:
getFreq in class Dictionary
contains
public boolean contains(java.lang.String word)
- Overrides:
contains in class Dictionary
containsPattern
public boolean containsPattern(java.lang.String pattern)
countWordsForPattern
public int countWordsForPattern(java.lang.String pattern)
getPatternWordArray
public java.lang.String[] getPatternWordArray(java.lang.String word)
getWordList
public java.lang.String getWordList(java.lang.String word)
main
public static void main(java.lang.String[] args)