Chapter 1

Linear algebra

Neural networks are made of two things: numbers arranged in rectangles, and a few simple operations on those rectangles. This chapter is the entire toolkit. Read it slowly. Every symbol you see here is one you will see for the rest of the guide.

Vectors are just lists of numbers

A vector is an ordered list of numbers. Think of it as a column of values stacked vertically. We write it with bold lowercase letters like $x$ so it's easy to spot:

x = x_{1} x_{2} ⋮ x_{n}

Adding and scaling vectors

Two operations are the bedrock of everything else: add two vectors, or multiply a vector by a regular (non-bold) number.

a + b = [a_{1} + b_{1} a_{2} + b_{2}], c a = [c a_{1} c a_{2}]

Drag the sliders below. The dashed green vector is $a + b$ . Watch it change as you move $a$ and $b$ .

aₓ3.0

aᵧ1.0

bₓ1.0

bᵧ2.0

a \cdot b = 5.00

∥ a ∥ = 3.16, ∥ b ∥ = 2.24

cos θ = 0.71, θ = 45.0°

The dot product — measuring similarity

The dot product takes two vectors and produces a single number. Multiply matched components, then sum:

a \cdot b = i = 1 \sum n a_{i} b_{i} = a_{1} b_{1} + a_{2} b_{2} + \dots + a_{n} b_{n}

There's a beautiful geometric fact about dot products you'll use forever:

a \cdot b = ∥ a ∥ ∥ b ∥ cos θ

Matrices are stacks of vectors

A matrix is a rectangle of numbers. Capital letters like $A$ for matrices, two indices for each entry — first the row, then the column:

A = [A_{11} A_{21} A_{12} A_{22} A_{13} A_{23}] \in R^{2 \times 3}

You can read a matrix two ways, both useful:

As stacked rows. Each row is a vector. In a neural network, each row is one neuron's weight vector.
As stacked columns. Each column is a vector. In that view, each column is one input feature's contribution to every neuron.

Matrix multiplication = many dot products at once

To multiply two matrices $C = A B$ , the inner dimensions must match: if $A$ is $m \times p$ and $B$ is $p \times n$ , then $C$ is $m \times n$ . Each entry of $C$ is one dot product:

C_{ij} = k = 1 \sum p A_{ik} B_{k j}

Step through the same kind of multiplication interactively:

Step 0/4 — idle

1	2	3
4	5	6

7	8
9	10
11	12

·	·
·	·

Transpose, and one full forward pass

The transpose of a matrix flips it diagonally — rows become columns and vice versa. We write the transpose with a little T in the upper-right:

If A = 135246, then A^{⊤} = [123456] .

Now the payoff. The forward pass of a single neural-network layer is just:

Z = X W + b

Next: Calculus you actually need →