Notes: ID3

CPSC 352 -- Artificial Intelligence
Notes: ID3 Algorithm

Concept Learning:The ID3 Algorithm

Inductive Learning: Construct a classification rule in some predefined attribute language based on a training set of positive (+) and negative (-) instances.

Toy Example:Given the following inputs describing the stature, hair color and eye color of 8 subjects, the ID3 algorithm will generate the decision tree shown:

[1, short, blond, blue: +]	[5, short, dark, blue: -]	
[2, tall, blond, brown: -]	[6, tall, dark, blue: -]
[3, tall, red, blue: +]	        [7, tall, blond, blue:+]
[4, tall, dark, brown: -]       [8, short, blond, brown: -]

The tree classifies the positive and negative examples into two categories: red heads or blue-eyed blonds (+) versus dark haired or brown-eyed blonds (-). Note that the height of these subjects is irrelevant.

Decision Tree for Credit Risk

No.	Risk	History	Debt	Collateral	income
1	high	bad	high	none	$0-15K
2	high	unk	high	none	$15-35K
3	mod	unk	low	none	$15-35K
4	high	unk	low	none	$0-15K
5	low	unk	low	none	over $35K
6	low	unk	low	adequate	over $35K
7	high	bad	low	none	$0 to 15K
8	mod	bad	low	adequate	over $35K
9	low	good	low	none	over $35K
10	low	good	high	adequate	over $35K
11	high	good	high	none	$0 to $15K
12	mod	good	high	none	$15 to $35K
13	low	good	high	none	over $35K
14	high	bad	high	none	$15 to $35K

ID3 Algorithm

function induce_decision_tree(Examples, Properties) {
   if all entries in Examples are in the same class
       return a leaf node labeled with that class
   else if Properties is empty
       return a leaf node labeled with disjunction of all classes in Examples
   else {
       select a property P and make it the root of the current tree
       delete P from Properties
       for each value, V, of P {
           create a branch labeled V
           let partition[V] be elements of Examples with values V for property P
           call induce_decision_tree(partition[V], Properties) and attach result to V
       }
   }
}

In Class Exercise

Build a decision tree for the credit risk problem starting with the income property.
Click here for a solution!

Choosing Decision Tree Properties

As the in-class exercise reveals, if you choose a different propery as the base of the tree, you get a different tree.

Information Theory (Shannon, 1948) can be used to select the property that contributes the largest information gain as the root of each subtree.

Information Content: The amount of information in a message is a function of the probability of the occurrence of each possible message. Given a universe of messages M = {m₁, m₂, ...m_n} and a probability p(m_i) for the occurence of each message, the information content of a message in M is given by:

I(M) = Sum_from_1_to_n[-p(m_i)log₂(p(m_i))]

Information content is measured in bits, but this is not the same notion of bit as in binary digit.

Coin Toss Example. The amount of information in a fair coin toss is

I(fair_coin) = -p(heads)log₂(p(heads)) - p(tails)log₂(p(tails))
= -1/2log₂(1/2) - p(1/2)log₂(1/2)) = 1 bit

Unfair Coin Toss. The amount of information for an unfair coin where the coin has a 75% chance of being heads would be:

I(unfair_coin) = -3/4log₂(3/4) - p(1/4)log₂(1/4)) = 0.811 bits

Because knowledge would enable you to bet correctly 3/4 of the time, a message about the outcome of a given toss is worth less than a message about the outcome of a given toss of a fair coin.

Applying Information Theory to ID3

Rule: At each level of the tree, choose the property that maximizes information gain.

We won't go into the details of computing information gain, other than to say that for the root of the tree, the information gain for property P is the difference between the total amount of information in the tree and the expected gain of information for that property:

gain(P) = I(Credit_Risk_Table) - E(P)

The results for the Credit Risk Problem are:

Property	Information Gain
Income	0.967 bits
Credit History	0.266 bits
Debt	0.581 bits
Collateral	0.756 bits

Given these values, we would choose Income as the root property. And reapply this algorithm for each subsequent level. This leads to much smaller decision tree than our original tree for the Credit History Problem.

Evaluating ID3

Quinlan (1983) evaluated ID3's performance on learning to classify boards in a chess endgame involving White King and White Rook against Black King and Black Knight. The goal was to recognize boards in which black loses within 3 moves.

There were 23 Primitive attributes such as inability to move the king safely.

Domain size: 1.4 million boards
Losses for Black = 474,000
Training Set: Randomly selected
Test Size: 10,000 randomly selected boards


Training Set Size	% of Universe	Errors in 10,000 Trials	Predicted Errors
200	0.01	199	728
1,000	0.07	33	146
5,000	0.36	8	29
25,000	1.79	6	7
125,000	8.93	2	1

Issues That Affect ID3

Bad data: Two identical attributes given different results.
Missing data: Perhaps its too difficult to obtain.
Continuous Attribute Sets: Income is a real number.
Data Set Too Large: Need Different algorithm

Quinlan's C4.5 (1996) is a refinement of ID3 that addresses some of these issues.

CPSC 352 -- Artificial Intelligence Notes: ID3 Algorithm