Notes: Version Space Search

CPSC 352 -- Artificial Intelligence
Notes: Version Space Search

Introduction

The version space search algorithm (Mitchell,1982) is a form of inductive learning implemented as search through a concept space.

In our discussion, we use PROLOG syntax to represent concepts. So color(ball, red) represents the fact that the color of the ball is red.

General Idea: Starting from an initial concept or set of concepts, we generalize and/or specialize in response to each positive or negative training instance. The learned concept must be general enough to cover all positive examples and specific enough to exclude all negative examples.

Rule: Positive Example --> Generalize the candidate concept(s)

Rule: Negative Example --> Specialize the candidate concept(s)

Def: A concept c is maximally specific if it covers all positive examples, no negative examples, and for any other concept c' that covers the positive examples, c <= c'. This means that c is at least as specific than c'. The set of sentences that match c is no greater than the set that matches c'.

Def: A concept c is maximally general if it covers none of the negative examples and for any other concept c' that covers no negative examples, c >= c'. In other words, c is at least as general c'. The set of sentences that match c is no less than the set that matches c'.

Generalization Operators

Replace constants with variables: color(ball,red) --> color(X,red)
Drop conditions from a conjunction: shape(X,round) ^ size(X,small) ^ color(X,red) --> shape(X,round) ^ color(X,red)
Add a disjunct: shape(X,round) ^ size(X,small) ^ color(X,red) --> shape(X,round) ^ size(X,small) ^ (color(X,red) v color(X,blue))
Replace a property with its parent: color(X,red) --> color(X,primary_color)

Candidate Elimination Algorithm (Specific to General)

Begin (Specific to General)
   Initialize S to the first positive training instance
   Let N be the set of all negative instances seen so far.
   
   For each positive instance, p {
       For every s in S, if s does not match p, replace s with its most specific generalizations 
         that match p.
       Delete from S all hypotheses more general than some other hypothesis in S
       Delete from S all hypotheses that match a previously observed negative instance in N.
   }

   For every negative instance, n {
       Delete all members of S that match n.
       Add n to N to check future hypotheses for overgeneralization.
   }
End

Example

0 S = {}, N = {}

1 Positive: obj(small,red,ball) S = {obj(small,red,ball)}, N = {}

2 Positive: obj(small,white,ball) S = {obj(small,X,ball)}, N = {}

3 Positive: obj(large,blue,ball) S = {obj(X,X,ball)}, N = {}

Candidate Elimination Algorithm (General to Specific)

Begin (General to Specific)
   Initialize G to contain the most general concept of the space
   Let P be the set of all positive instances seen so far.
   
   For each negative instance, n {
       For every g in G, if g matches n, replace g with its most general specializations
         that do not match n.
       Delete from G all hypotheses more specific than some other hypothesis in G
       Delete from G all hypotheses that fail to match some positive instance in P.
   }

   For every positive instance, p {
       Delete all hypotheses in G that fail to match p.
       Add p to P.
   }
End

Example

NOTE: Background knowledge includes size= {large,small}, color={red,white,blue}, shape={ball,brick,cube}.

0 G = {obj(X,Y,Z)}, P = {}

1 Negative: obj(small,red,brick) G = {obj(large,Y,Z),obj(X,white,Z),obj(X,blue,Z),obj(X,Y,ball),obj(X,Y,cube)}, P = {}

2 Positive: obj(large,white,ball) G = {obj(large,Y,Z),obj(X,white,Z),obj(X,Y,ball)}, P = {obj(large,white,ball)}

3 Negative: obj(large,blue,cube) G = {obj(large,white,Z),obj(large,red,Z),obj(large,Y,ball),obj(large,Y,brick),obj(X,white,Z),obj(X,Y,ball)}, P = {obj(large,white,ball)}

Remove obj(large,white,Z) and obj(large,Y,ball) because more specific than obj(X,white,Z) and obj(X,Y,ball)

Remove obj(large,red,Z) and obj(large,Y,brick) because don't match any positive instance

G = {obj(X,white,Z),obj(X,Y,ball)}, P = {obj(large,white,ball)}

4 Positive: obj(small,blue,ball) G = {obj(X,Y,ball)}, P = {obj(large,white,ball),obj(small,blue,ball)}

Bi-directional Search

The candidate elimination algorithm combines the two algorithms into a bi-directional search of the space. The search concludes when G = S and both are singletons, the algorithm succeeds in finding a target concept. When S = G = {}, the algorithm fails.

In-class Exercises

Perform specific-to-general search on the following set of training examples.
1. o(small,red,ball) +
2. o(large,blue,ball) +
3. o(small,red,cube) -
4. o(large,white,brick) -
5. o(small,white,brick) -
6. o(small,white,ball) +
Perform general-to-specific search on the following set of training examples.
1. o(small,red,ball) +
2. o(large,blue,ball) +
3. o(small,red,cube) -
4. o(large,white,brick) -
5. o(small,white,brick) -
6. o(small,white,ball) +

Click here for a solution!

Try these two examples using the PROLOG program PROLOG Version Space Search. To run the first example, load the program into PROLOG and type do_spec_to_gen. For the second example, type do_gen_to_spec.

0		S = {}, N = {}
1	Positive: obj(small,red,ball)	S = {obj(small,red,ball)}, N = {}
2	Positive: obj(small,white,ball)	S = {obj(small,X,ball)}, N = {}
3	Positive: obj(large,blue,ball)	S = {obj(X,X,ball)}, N = {}

0		G = {obj(X,Y,Z)}, P = {}
1	Negative: obj(small,red,brick)	G = {obj(large,Y,Z),obj(X,white,Z),obj(X,blue,Z),obj(X,Y,ball),obj(X,Y,cube)}, P = {}
2	Positive: obj(large,white,ball)	G = {obj(large,Y,Z),obj(X,white,Z),obj(X,Y,ball)}, P = {obj(large,white,ball)}
3	Negative: obj(large,blue,cube)	G = {obj(large,white,Z),obj(large,red,Z),obj(large,Y,ball),obj(large,Y,brick),obj(X,white,Z),obj(X,Y,ball)}, P = {obj(large,white,ball)}
		Remove obj(large,white,Z) and obj(large,Y,ball) because more specific than obj(X,white,Z) and obj(X,Y,ball)
		Remove obj(large,red,Z) and obj(large,Y,brick) because don't match any positive instance
		G = {obj(X,white,Z),obj(X,Y,ball)}, P = {obj(large,white,ball)}
4	Positive: obj(small,blue,ball)	G = {obj(X,Y,ball)}, P = {obj(large,white,ball),obj(small,blue,ball)}