Lab 12: Constructing a Concordance (Final Graded Lab)
Solution: MakeConcordance.java
Solution: CEntry.java
Objectives
The objectives of this lab are:
- To evaluate your understanding of ADTs.
- To evaluate your ability to use the Java API to solve a programming problem.
The Problem
A concordance
is an alphabetical list of the words in a book. For this exercise
each concordance entry will consist of the following items: the word,
how many times it occurs in the book, its line number and its word order
on the line. For example, the concordance entry of the word
"to" in test document is:
to(7) locations: [1,5] [10,6] [18,7] [22,7] [31,12] [33,5] [34,1]
Thus, there were a total of 7 occurences of "to" in this document
and the first occurrence was on line 1, word 5, the second was on line
10, word 6. Note that the occurrences are stored in order from first to
last.
Download the file taleshort.txt,
which contains a few paragraphs from the well known Dicken's novel
A Tale of Two Cities (or the entire book,
Tale of Two Cities).
Write a Java program that will construct a concordance for any book
(stored as an ascii text file) named on the command line. After the
program constructs the concordance, it should allow the user to
repeatedly enter words on the command line and it should display
the entry for that word. Here is some sample output:
$ java MakeConcordance taleshort.txt
Input a word to look up (or just hit <RET> to quit): to
to(7) locations: [1,5] [10,6] [18,7] [22,7] [31,12] [33,5] [34,1]
Input a word to look up (or just hit <RET> to quit): westminster
westminster(1) locations: [27,6]
Input a word to look up (or just hit <RET> to quit): impeach
That word does not occur in this book
Input a word to look up (or just hit <RET> to quit):
$
Strategy/Approach
Here are some questions to think about:
- How does this problem differ from last week's? What parts of
last week's code can I use to solve this week's problem? Can you use
a java.util.HashMap
again? Is it necessary to sort your data for this problem?
- Last week's problem involved definition a WordFreq
class that kept track of a word and its frequency. For this week's
problem you will need to revise or extend that class to also keep
track of a list of the word's occurences, each of which consists of an
entry of the form [l, w], where l represents a line
number and
w represents the word's place on the line. These entries can
be Java Strings. What additional instance variables and methods will this
require?
For the list of occurrences, you should use a java.util.LinkedList
object.
Sub Task: Reading a Text File
You should be able to adapt the code you developed last week for
this task. In this case, however, you need to be able to read
each line separately. Here's sample code that reads a text file
line by line:
// Read a file and print its lines
import java.io.*; // Import Java IO classes
...
try {
File f = new File(args[0]);
InputStreamReader iStream = new InputStreamReader( new FileInputStream(f));
BufferedReader reader = new BufferedReader(iStream);
String inString = reader.readLine();
while (inString != null) {
System.out.println("LINE:" + inString);
inString = reader.readLine();
}
} catch (FileNotFoundException e) {
System.err.println("Error: File " + args[0] + " not found");
e.printStackTrace();
} catch (IOException e) {
System.err.println("Error: I/O exception");
e.printStackTrace();
}
...
Sub Task: Command-line Input
For command line input, you can use a java.io.BufferedReader
or java.util.Scanner
object. Here are the basic commands you need to use for a
BufferedReader:
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String inString = reader.readLine();
To repeatedly read the user's command, you would put
readLine() into a loop that exits when the reader object returns an
empty line.
Sub Task: Concordance Entries
Design an appropriate Java class to store each concordance
entry. This class should be a revision of last week's
WordFreq class (or an extension of it). This class needs to
store the word, its count, and a list of the locations of its entries.
Here's how my solution displays an entry:
to(7) locations: [1,5] [10,6] [18,7] [22,7] [31,12] [33,5] [34,1]
Grading
You will be graded on whether your program works correctly and
efficiently, is well designed, uses appropriate data structures and
algorithms, is well documented, and is completed within the lab
period. Among the design considerations that I will be looking for
are whether you make proper use of various object-oriented concepts
and principles, such as the toString() method, the
distinction between public and private, and so on. Because there is
always a chance that you may not completely finish the project, you
should document your code as you go. That way you can receive partial
credit for documentation.
You're done. Great work!