![]() |
CPSC 110-08: Computing on Mobile Phones
|
This lesson is based on materials developed and made available by Tammy Pirmann of Springfield Twp High School as part of her CS Principles course. The data sets were the result of an NSF-funded collaboration between Tammy and Slobodan Vucetic of Temple University.
The data sets we are using are made available through the generosity of Steve Glassman of the Compaq Systems Research Center. They are data that was gathered as part of a late-1990s research project.
The data sets we are using were gathered as part of research project conducted by Digitial Equipment Corporation. They gather data from users about their movie preferences. There are three data files:
Source: Details about the details are taken from here.
ID: Number -- primary key
Age: Number
Gender: Text -- one of "M", "F"
Zip_Code: Text
ID: Number -- primary key
Name: Text
PR_URL: Text -- URL of studio PR site
IMDb_URL: Text -- URL of Internet Movie Database entry
Theater_Status: Text -- either "old" or "current"
Theater_Release: Date/Time
Video_Status: Text -- either "old" or "current"
Video_Release: Date/Time
Action, Animation, Art_Foreign, Classic, Comedy, Drama, Family, Horror, Romance, Thriller: Yes/No
Person_ID: Number
Movie_ID: Number
Score: Number -- 0 <= Score <= 1
Weight: Number -- 0 < Weight <= 1
Modified: Date/Time
A movie's score is the rating provided by this person for this movie. The zero-to-five star rating used externally on EachMovie is mapped linearly to the interval [0,1]. Here's a histogram of the Score values:
Score Count
0 347191
0.2 150495
0.4 339718
0.6 701236
0.8 761676
1.0 511667
In other words, voters were asked to rate movies from awful (0) to great (1), with 4 intermediate rankings.
Open each of the data files in separate Browser tabs. You won't be editing the files, just browsing them and answering questions about them.
Answer each of the of the following questions: