a visual representation of Andrej Karapathy's

micro
grad

A scalar-valued autograd engine in ~100 lines of Python. Every neural network, however vast, reduces to this: repeated simple operations, each one remembering how to undo itself.

01 / the idea

Every operation builds a graph.

Writing L = (a + b) × c quietly constructs a directed acyclic graph. Each node records its value and, crucially, how to reverse its contribution to the loss. The backward pass is just reading those notes in reverse order.

∂L/∂a = ∂L/∂d · ∂d/∂a ← chain rule

hover any node after running a pass

data (forward)

gradient (backward)

leaf / input

02 / the unit

A neuron is a dot product.

Inputs multiplied by learned weights, summed with a bias, squashed through an activation. The magic is that every number here is a Value, differentiable by construction.

out = ReLU(w₁x₁ + w₂x₂ + b)

hover weights and nodes to inspect values

03 / the network

Stack neurons into layers.

Each layer transforms its inputs into a new representation. Edge thickness encodes weight magnitude; color encodes sign. Watch the signal flow forward and the gradients flow back.

architecture: 2 → 4 → 4 → 1

04 / the loop

Descend the gradient.

Forward pass → compute loss → backward pass → nudge every parameter by its gradient, scaled by a learning rate. Repeat. Watch the boundary emerge.

θ ← θ − α · ∂L/∂θ

decision boundary

step

loss

accuracy

micrograd

Every operation builds a graph.

A neuron is a dot product.

Stack neurons into layers.

Descend the gradient.

micro
grad