Notes: Neural Net Learning

CPSC 352 -- Artificial Intelligence
Notes: Machine Learning: Neural Networks

Introduction

In this lecture we consider the basics of machine learning in neural networks.

An Artificial Neuron

Connectionist Learning

Hebbian Learning (1949):

Repeated stimulation between two or more neurons strengthens the connection weights among those neurons. One problem with this model is it had no way to model inhibition between neurons.

Perceptron Learning (1958):

A perceptron is a single-layer network that calculates a linear combination of its inputs and outputs a 1 if the result is greater than some threshold and a -1 if it is not:

Supervised Perceptron Learning

c : learning-rate parameter
d: desired output value
signal() is the perceptron's actual output value, which is always +1 or -1
cases
1. d - signal = 0 ==> do nothing
2. d - signal = +2 ==> increment w_i by 2cx_i
3. d - signal = -2==> decrement w_i by 2cx_i

By repeatedly adjusting weights in this fashion for an entire set of training data, the perceptron will minimize the average error over the entire set.

Minsky and Papert (1969) showed that if there is a set of weights that give the correct output for an entire training set, a perceptron will learn it.

Example: Perceptrons can learn models for the following primitive boolean functions: AND, OR, NOT, NAND, NOR. Here's an example for AND:

Limitations of Perceptrons

Minsky and Papert (1969) showed that perceptrons could not model the exclusive-or function, because its outputs are not linearly separable. Two classes of outputs are linearly separable if and only if you can draw a straight line in two dimensions that separates one classification from another.

The Delta Rule (Rumelhart, 1986)

The perceptron activation function is a hard-limiting threshold function. A more general neural network uses a continuous activation function. One popular function is the sigmoidal (s-shaped) function, such as the logistic function:

f(net) = 1/(1 + e^-L*net)

where L is lambda, a parameter for "squashing" the function and net is the output or sum of the weights.

The delta rule is a learning rule for a network with a continuous (and therefore differentiable) activation function. It attempts to minimize the cumulative error over a data set as a function of the weights in the network:

Delta(w_ji) = c(d_i - O_i)f'(net_i)x_j

where c is the learning rate, d_i and O_i are the desired and actual outputs for the ith node, and f'(net) is the derivative of the activation function for the ith node, and x_j is the jth input to the ith node.

Key Point: The delta rule is tries to minimize the slope of the cumulative error in a particular region of the network's output function. This makes is susceptible to local minima.

Back propagation Learning for Multilayer Networks

Back propagation starts at the output layer and propagates the error backwards through the network. The learning rule is often called the generalized delta rule.

Back propagation activation function is the logistic function:

f(net) = 1/(1 + e^-L*net)

The logistic function is useful for assigning error to the hidden layers in a multi-layer network because:

It is continuous and has a derivative everywhere.
It is sigmoidal.
The derivative is greatest where the function is steepest. This assigns the most error to nodes whose activation is least certain.

The formulas for computing the adjustments of the kth weight of the ith node:

Delta(w_ik) = -c(d_i - O_i) * O_i(1 - O_i)x_ik for nodes on the output layer

Delta(w_ik) = -c * O_i(1 - O_i)Sum(-delta_j * w_ij)x_ik for nodes on the hidden layers.

NETtalk System (Sejnowski and Rosenberg, 1987)

Nettalk is a neural network, developed in 1987, that learns to pronounce English text. It learns to associate phonemes with string of text.

Properties of NETtalk

Learned to pronounce English text.
Inputs: String of text, e.g. "I say hello to you" (7 letter window)
Input Unit: 29 units, one for each letter and 3 for punctuation and spaces
Outputs: Phonemes (26 different ones)
Hidden Elements: 80 (These units learn the pronounciation rules)
Connections: 18,629
Learning rule: back propagation
Interesting Properties
- Performance improves with training but at a slower rate.
- Graceful degradation
- Relearning was highly efficient

NETtalk Comparison with ID3 (Shavlik, 1991)

Both ID3 and NETtalk were able to pronounce 60% after 500 training examples
ID3 required 1 pass through the training data
NETtalk was allowed 100 passes through the 500 training data

Using Encog Java Neural Network Framework

Homework Exercise: Using the links below, download the Encog Framework into a directory on your Linux account. Then perform the exercises.

Downloads Download and unzip each of the following Encog packages from the Encog Download Site:

encog-workbench-3.0.1-release.zip
encog-examples-3.0.1-release.zip
encog-core-3.0.1-release.zip

Exercises

Take a look at the Getting Started Documentation.
Command Line Exercise: Do the Encog Java XORHelloWorld example. Try working through the ANT version. On my system, this is the Java command you need to run from within the .../encog-examples-3.0.1/lib:
```
java -cp encog-core-3.0.1-SNAPSHOT.jar:examples.jar org.encog.examples.neural.xor.XORHelloWorld
```
GUI Exercise: Do the Workbench Classification Example.

CPSC 352 -- Artificial Intelligence Notes: Machine Learning: Neural Networks