- Input: code computing a function
- Output: code to compute one or more derivatives of the function.

AD writes functions as a sequence of simple compositions $f_{5}(f_{4}(f_{3}(f_{2}(f_{1}(x)))))$

And writes derivatives using the chain rule:

$f_{′}(x)=f_{5}(f_{4}(f_{3}(f_{2}(f_{1}(x)))))∗f_{4}(f_{3}(f_{2}(f_{1}(x))))∗f_{3}(f_{2}(f1(x)))∗f_{2}(f_{1}(x))∗f_{1}(x)$

We decompose code using the chain rule to make derivative code. This can lead to a lot of redundant computations. We can use dynamic programming to avoid redundant calculations.

## Multi-variable AD

We define a computation graph as a DAG

- Root nodes are the parameters (and inputs).
- Branch nodes are computed values (𝛼 values).
- Leaf node is the function value.

Two stages (example of a function that takes $x$ and $y$ and calculates a $z$):

- Forward AD pass is called forward propagation:
- Computes $z$ from $x$ and $y$ and passes it to its outputs
- Storing intermediate calculations $∂x∂z $ and $∂y∂z $

- Backward AD pass is called backpropagation:
- Starts from the end with $∂L/∂L$ which is just 1
- Computes $∂x∂L $ and $∂y∂L $ from $∂z∂L $
- Using intermediate calculations stored during forward pass
- $∂x∂L =∂z∂L ∂x∂z $
- $∂y∂L =∂z∂L ∂y∂z $