Neural networks, one neuron at a time
A neural network is a stack of linear layers separated by non-linear activation functions. That's it. The richness comes from how those simple pieces compose. We'll build it up from a single neuron — no skipping.
A single neuron, end to end
One neuron does three things in order:
- Weighted sum. Take the input vector , dot it with a weight vector , add a bias .
- Activation. Squash the result through a non-linear function (sigmoid, ReLU, tanh, etc.).
- Output. The squashed number is what the neuron passes on.
Tune below and see how the decision region shifts:
Green line is where output = 0.5. The neuron splits the plane in half — a single linear boundary.
A layer = many neurons in parallel
Stack neurons side by side. Each has its own weight vector. Pack their weight vectors as the rows of a matrix , their biases as a vector . The whole layer is one matrix-vector multiply followed by an element-wise activation:
For a whole batch of inputs (one per row of ), we usually transpose the convention so we can multiply once and process everyone in parallel:
Why activations exist at all
Without a non-linear activation, stacking layers does literally nothing useful. Two linear maps composed are still a single linear map. Watch:
Compare any pair of activations side-by-side (solid line = function, dashed = derivative):
Stacking layers — a deep network
Chain layers together. Each layer takes the previous layer's output as its input. We use a superscript in parentheses to label the layer index:
Loss — measuring how wrong the network is
To train a network we need a single number that says "this prediction was bad." That number is the loss (or "cost"). We pick a loss function based on what kind of problem we're solving: