CPSC 352 -- Artificial Intelligence
Notes: ID3 Algorithm

Concept Learning:The ID3 Algorithm

Inductive Learning: Construct a classification rule in some predefined attribute language based on a training set of positive (+) and negative (-) instances.

Toy Example:Given the following inputs describing the stature, hair color and eye color of 8 subjects, the ID3 algorithm will generate the decision tree shown:

[1, short, blond, blue: +]	[5, short, dark, blue: -]	
[2, tall, blond, brown: -]	[6, tall, dark, blue: -]
[3, tall, red, blue: +]	        [7, tall, blond, blue:+]
[4, tall, dark, brown: -]       [8, short, blond, brown: -]

The tree classifies the positive and negative examples into two categories: red heads or blue-eyed blonds (+) versus dark haired or brown-eyed blonds (-). Note that the height of these subjects is irrelevant.

Decision Tree for Credit Risk

No.RiskHistoryDebtCollateralincome
1highbadhighnone$0-15K
2highunkhighnone$15-35K
3modunklownone$15-35K
4highunklownone$0-15K
5lowunklownoneover $35K
6lowunklowadequateover $35K
7highbadlownone$0 to 15K
8modbadlowadequateover $35K
9lowgoodlownoneover $35K
10lowgoodhighadequateover $35K
11highgoodhighnone$0 to $15K
12modgoodhighnone$15 to $35K
13lowgoodhighnoneover $35K
14highbadhighnone$15 to $35K

ID3 Algorithm

function induce_decision_tree(Examples, Properties) {
   if all entries in Examples are in the same class
       return a leaf node labeled with that class
   else if Properties is empty
       return a leaf node labeled with disjunction of all classes in Examples
   else {
       select a property P and make it the root of the current tree
       delete P from Properties
       for each value, V, of P {
           create a branch labeled V
           let partition[V] be elements of Examples with values V for property P
           call induce_decision_tree(partition[V], Properties) and attach result to V
       }
   }
}

In Class Exercise

Build a decision tree for the credit risk problem starting with the income property.
Click here for a solution!

Choosing Decision Tree Properties

As the in-class exercise reveals, if you choose a different propery as the base of the tree, you get a different tree.

Information Theory (Shannon, 1948) can be used to select the property that contributes the largest information gain as the root of each subtree.

Information Content: The amount of information in a message is a function of the probability of the occurrence of each possible message. Given a universe of messages M = {m1, m2, ...mn} and a probability p(mi) for the occurence of each message, the information content of a message in M is given by:

I(M) = Sum_from_1_to_n[-p(mi)log2(p(mi))]

Information content is measured in bits, but this is not the same notion of bit as in binary digit.

Coin Toss Example. The amount of information in a fair coin toss is

I(fair_coin) = -p(heads)log2(p(heads)) - p(tails)log2(p(tails))
= -1/2log2(1/2) - p(1/2)log2(1/2)) = 1 bit

Unfair Coin Toss. The amount of information for an unfair coin where the coin has a 75% chance of being heads would be:

I(unfair_coin) = -3/4log2(3/4) - p(1/4)log2(1/4)) = 0.811 bits

Because knowledge would enable you to bet correctly 3/4 of the time, a message about the outcome of a given toss is worth less than a message about the outcome of a given toss of a fair coin.

Applying Information Theory to ID3

Rule: At each level of the tree, choose the property that maximizes information gain.

We won't go into the details of computing information gain, other than to say that for the root of the tree, the information gain for property P is the difference between the total amount of information in the tree and the expected gain of information for that property:

gain(P) = I(Credit_Risk_Table) - E(P)

The results for the Credit Risk Problem are:

PropertyInformation Gain
Income0.967 bits
Credit History0.266 bits
Debt0.581 bits
Collateral0.756 bits

Given these values, we would choose Income as the root property. And reapply this algorithm for each subsequent level. This leads to much smaller decision tree than our original tree for the Credit History Problem.

Evaluating ID3

Quinlan (1983) evaluated ID3's performance on learning to classify boards in a chess endgame involving White King and White Rook against Black King and Black Knight. The goal was to recognize boards in which black loses within 3 moves.

There were 23 Primitive attributes such as inability to move the king safely.

Training Set Size% of UniverseErrors in 10,000 TrialsPredicted Errors
2000.01199728
1,0000.0733146
5,0000.36829
25,0001.7967
125,0008.9321

Issues That Affect ID3

Quinlan's C4.5 (1996) is a refinement of ID3 that addresses some of these issues.