CPSC 115L: Introduction to Computing Fall2010

Lab 12: Analyzing Text

December 1, 2

As usual, you are expected to work with an assigned partner as a pair. Both you and your partner will receive the same grade. Both of you should always save your laboratory work on your own accounts.

Objectives

The main objectives of this assignment are
  1. to give you practice using one-dimensional Java arrays.

Introduction

In this lab we will analyze a text file by computing the relative frequency of its individual letters.

You will define and use two classes, the FrequencyRecord class, which stores data for this problem, and the FrequencyAnalyzer class, which performs various analyses.

The FrequencyRecord Class

Define a FrequencyRecord class that meets the following design specs (+ means public, - means private):

FrequencyRecord
- letter: char
- count: int
+ FrequencyRecord(ch: char)
+ getLetter(): char
+ setLetter(ch: char)
+ getCount(): int
+ setCount(n: int)
+ toString(): String

The FreqeuencyRecord class has two instance variables, a letter and its count. It has a constructor method and the standard getters and setters and toString() methods. Its toString() method should return a String representing the character and its count:

e: 504

Later in the lab you will create and use an array of FrequencyRecords to store the relative frequencies of the letters in a text.

TODO: Define and test the FrequencyRecord class. Write the Java code needed to define this class. Include a main() method in which you create several instances of FrequencyRecord objects and test that all the methods work correctly. Here's some sample testing code:

FrequencyRecord r1 = new FrequencyRecord('a');  // Record for letter 'a'
System.out.println(r1);                         // Print the record
r1.setCount(1);                                 // Set its count to 1
System.out.println(r1);                         // Print it again
// etc.

The FrequencyAnalyzer Class

Define a FrequencyAnalyzer class that meets the following design specs:

FrequencyAnalyzer
- frequencies: FrequencyRecord[]
- text: String
+ FrequencyAnalyzer(filename: String)
+ getText(): String
+ setText(s: String)
+ readFile(name: String): String
+ countLetters(s:String):FrequencyRecord[]
+ toString(): String

According to this design, frequencies is array of FrequencyRecords. When you create this array, you will need 26 records, representing the letters a..z, each storing the letter and its count. In this case the record for letter 'a' would be stored in frequencies[0] and record for letter 'z' would be stored in frequencies[25].

Reading a Text File. You will need a method to read the contents of a text file and return the text as String. Copy and paste the following method into your FrequencyAnalyzer class:

    /**
     * Reads the named text file and returns its contents as a string.
     * @param fName is a string giving the files name (full path)
     * @return the contents of fName as a String
     */
    public String readFile(String fName) {
	String msg="";
	try {
	    File theFile = new File(fName);
	    InputStreamReader iStream = new InputStreamReader( new FileInputStream(theFile));
	    int length = (int)theFile.length();
	    char input[] = new char[length];
	    iStream.read(input);
	    msg = new String(input);
	} catch (IOException e) {
	    e.printStackTrace();
	} // catch
	return msg;
    }

Using Command-Line Arguments. You will also need a main() method that can read the name of the input file from the command line. Copy and paste the following method into your FrequencyAnalyzer class. Note here that args[0] in the following code stores the name of the input file:

    public static void main(String args[]) {
	String msg="";
	FrequencyAnalyzer analyzer = new FrequencyAnalyzer();
	if (args.length == 0) {
	    System.out.println("Usage: java FrequencyAnalyzer filename");
	    System.exit(0);
	} else {
	    msg = analyzer.readFile(args[0]);
	    System.out.println(msg);
	}
    }

TODO: Once you have incorporated these two methods into your FrequencyAnalyzer class, compile and run it as follows (to print out the contents of your FrequencyAnalyzer.java file:

$ javac FrequencyAnalyzer.java
$ java FrequencyAnalyzer
Usage: java FrequencyAnalyzer filename
$ java FrequencyAnalyzer FrequencyAnalyzer.java
// The file will be printed..
$ 

As you see here, the second argument on the command line is the name of the input file.

TODO: Once you have confirmed that you can correctly read a text file, complete the coding of the FrequencyAnalyzer class:

  • Implement the constructor so that it will take the name of an input file (in the current directory) and call the readFile() method to read it into the text instance variable.
  • The constructor should also call the countLetters() method (when it is implemented). This method returns an array of FrequencyRecord, which you should assign to the frequencies variable.
  • Implement and test the setters and getters.
  • Implement and test the countLetters method. This method, which should be called in your constructor, should take the text and return an array of FrequencyRecords containing the frequency of each letter in the text. This method should be called from the constructor. (Don't forget about upper and lower case.)
    • Hint: To convert a String s to lowercase use: s = s.toLowerCase().
    • Hint: In addition to creating the FrequencyRecord[] array, you need to create 26 FrequencyRecords and put them in the array at locations 0 to 25.
    • Hint: When you are creating individual FrequencyRecords, you need to pass a char to the constructor. Here's how to convert an int k into a char value: char ch = (char)('a' + k)
    • Hint: You need to ignore letters other than 'a' to 'z'.
    • Hint: Use subtraction to convert a letter to an int: e.g., 'b' - 'a' = 1.
    • Hint: Store the FrequencyRecord for letter 'a' in location 0 of the array.
    • Hint: Notation to refer to letter a's count: frequencies[0].getCount()

As a confirmation of this task, print the letter frequencies for tomsawyer.txt:

a: 334
b: 74
c: 85
d: 223
e: 504
f: 94
g: 79
h: 303
i: 281
j: 11
k: 57
l: 182
m: 144
n: 301
o: 413
p: 79
q: 3
r: 241
s: 277
t: 448
u: 144
v: 40
w: 113
x: 5
y: 112
z: 1

Sorting

Write a method to implement the bubblesort algorithm to sort you frequencies from highest to lowest:

 // Bubblesort algorithm
 for (int i = 1; i < n; i++)
   for (int j = 0; j < n - i; j++)
     if (A[j] > A[j + 1])
       swap A[j] with A[j + 1];

Note that this algorithm sorts an array of int. You need to revise it so that it can sort an array of FrequencyRecord. You want to sort on the count of each letter. Your method should have the signature void sort(FrequencyRecord[] freqs).

Your program should produce the following output:

e: 504
t: 448
o: 413
a: 334
h: 303
n: 301
i: 281
s: 277
r: 241
d: 223
l: 182
m: 144
u: 144
w: 113
y: 112
f: 94
c: 85
g: 79
p: 79
b: 74
k: 57
v: 40
j: 11
x: 5
q: 3
z: 1

Documentation

Place a comment block at the beginning of each of your files: FrequencyRecord.java and FrequencyAnalyzer.java using the javadoc format -- e.g.,
/**
 *  File: FrequencyRecord.java
 *  Name: Your name
 */

Place a javadoc comment block before each method describing what the method does and the purpose of each of its parameters and return values -- e.g.:

/**
 * Sums an array of integers.
 * @param arr -- the array of integers
 * @return -- the sum of the elements in the array
 */
public int sum (int arr[]) {
  int sum = 0;
  for (int k = 0; k < arr.length; k++)
     sum += arr[k];
  return sum;
}

What to hand in

Hand in your source code for each of your files and a copy of your outputs.


* CPSC 115L home page
Valid HTML 4.01!