Automatic Differentiation (AD)

Input: code computing a function
Output: code to compute one or more derivatives of the function.

AD writes functions as a sequence of simple compositions $f_{5} (f_{4} (f_{3} (f_{2} (f_{1} (x)))))$

And writes derivatives using the chain rule:

$f^{'} (x) = f_{5}^{'} (f_{4} (f_{3} (f_{2} (f_{1} (x))))) * f_{4}^{'} (f_{3} (f_{2} (f_{1} (x)))) * f_{3}^{'} (f_{2} (f 1 (x))) * f_{2}^{'} (f_{1} (x)) * f_{1}^{'} (x)$

We decompose code using the chain rule to make derivative code. This can lead to a lot of redundant computations. We can use dynamic programming to avoid redundant calculations.

Multi-variable AD

We define a computation graph as a DAG

Root nodes are the parameters (and inputs).
Branch nodes are computed values (𝛼 values).
Leaf node is the function value.

Two stages (example of a function that takes $x$ and $y$ and calculates a $z$ ):

Forward AD pass is called forward propagation:
- Computes $z$ from $x$ and $y$ and passes it to its outputs
- Storing intermediate calculations $\frac{\partial z}{\partial x}$ and $\frac{\partial z}{\partial y}$
Backward AD pass is called backpropagation:
- Starts from the end with $\partial L / \partial L$ which is just 1
- Computes $\frac{\partial L}{\partial x}$ and $\frac{\partial L}{\partial y}$ from $\frac{\partial L}{\partial z}$
  - Using intermediate calculations stored during forward pass
  - $\frac{\partial L}{\partial x} = \frac{\partial L}{\partial z} \frac{\partial z}{\partial x}$
  - $\frac{\partial L}{\partial y} = \frac{\partial L}{\partial z} \frac{\partial z}{\partial y}$

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control