Notes: Machine Learning: Neural Networks

**c**:**learning-rate parameter****d**:**desired output value****signal()**is the**perceptron's actual output value**, which is always +1 or -1- cases
- d - signal = 0
**==>**do nothing - d - signal = +2
**==>**increment w_{i}by 2cx_{i} - d - signal = -2
**==>**decrement w_{i}by 2cx_{i}

- d - signal = 0

By repeatedly adjusting weights in this fashion for an entire set of
**training data**, the perceptron will minimize the average error
over the entire set.

Minsky and Papert (1969) showed that if there is a set of weights that give the correct output for an entire training set, a perceptron will learn it.

**Example:** Perceptrons can learn models for the following
primitive boolean functions: AND, OR, NOT, NAND, NOR. Here's an
example for AND:

The perceptron **activation function** is a **hard-limiting threshold function**.
A more general neural network uses a **continuous activation function**. One popular
function is the **sigmoidal (s-shaped) function**, such as the **logistic function:**

where

The **delta rule** is a learning rule for a network with a
continuous (and therefore differentiable) activation function. It attempts
to **minimize the cumulative error over a data set** as a function of
the weights in the network:

where

**Key Point:** The delta rule is tries to minimize the **slope** of the cumulative
error in a particular region of the network's output function. This makes is susceptible
to **local minima**.

Back propagation starts at the output layer and propagates the error backwards
through the network. The learning rule is often called the **generalized delta rule**.

Back propagation activation function is the logistic function:

The logistic function is useful for assigning error to the hidden layers in a multi-layer network because:

- It is
**continuous**and has a derivative everywhere. - It is
**sigmoidal**. - The derivative is greatest where the function is steepest. This
assigns the
**most error**to nodes whose activation is least certain.

The formulas for computing the adjustments of the kth weight of the ith node:

- Learned to pronounce English text.
- Inputs: String of text, e.g. "I say hello to you" (7 letter window)
- Input Unit: 29 units, one for each letter and 3 for punctuation and spaces
- Outputs: Phonemes (26 different ones)
- Hidden Elements: 80 (These units learn the pronounciation rules)
- Connections: 18,629
- Learning rule: back propagation
- Interesting Properties
- Performance improves with training but at a slower rate.
- Graceful degradation
- Relearning was highly efficient

- Both ID3 and NETtalk were able to pronounce 60% after 500 training examples
- ID3 required 1 pass through the training data
- NETtalk was allowed 100 passes through the 500 training data

**Homework Exercise: ** Using the links below, download the
Encog Framework into
a directory on your Linux account. Then perform the exercises.

**Downloads** Download and unzip each of the following Encog
packages from the Encog Download Site:

- encog-workbench-3.0.1-release.zip
- encog-examples-3.0.1-release.zip
- encog-core-3.0.1-release.zip

**Exercises**

- Take a look at the Getting Started Documentation.
**Command Line Exercise:**Do the Encog Java XORHelloWorld example. Try working through the ANT version. On my system, this is the Java command you need to run from within the`.../encog-examples-3.0.1/lib`:java -cp encog-core-3.0.1-SNAPSHOT.jar:examples.jar org.encog.examples.neural.xor.XORHelloWorld

**GUI Exercise:**Do the Workbench Classification Example.