Chapter 3

Neural networks, one neuron at a time

A neural network is a stack of linear layers separated by non-linear activation functions. That's it. The richness comes from how those simple pieces compose. We'll build it up from a single neuron — no skipping.

1

A single neuron, end to end

One neuron does three things in order:

  1. Weighted sum. Take the input vector , dot it with a weight vector , add a bias .
  2. Activation. Squash the result through a non-linear function (sigmoid, ReLU, tanh, etc.).
  3. Output. The squashed number is what the neuron passes on.

Tune below and see how the decision region shifts:

1.0
-1.0
0.0
Decision rule:

Green line is where output = 0.5. The neuron splits the plane in half — a single linear boundary.

2

A layer = many neurons in parallel

Stack neurons side by side. Each has its own weight vector. Pack their weight vectors as the rows of a matrix , their biases as a vector . The whole layer is one matrix-vector multiply followed by an element-wise activation:

For a whole batch of inputs (one per row of ), we usually transpose the convention so we can multiply once and process everyone in parallel:

One matmul handles every neuron and every example simultaneously.
3

Why activations exist at all

Without a non-linear activation, stacking layers does literally nothing useful. Two linear maps composed are still a single linear map. Watch:

Compare any pair of activations side-by-side (solid line = function, dashed = derivative):

Activation
4

Stacking layers — a deep network

Chain layers together. Each layer takes the previous layer's output as its input. We use a superscript in parentheses to label the layer index:

5

Loss — measuring how wrong the network is

To train a network we need a single number that says "this prediction was bad." That number is the loss (or "cost"). We pick a loss function based on what kind of problem we're solving:

Mean squared error — for regression (predicting numbers).
Binary cross-entropy — for yes/no problems.
Categorical cross-entropy — for multi-class.