• Input: code computing a function
  • Output: code to compute one or more derivatives of the function.

AD writes functions as a sequence of simple compositions

And writes derivatives using the chain rule:

We decompose code using the chain rule to make derivative code. This can lead to a lot of redundant computations. We can use dynamic programming to avoid redundant calculations.

Multi-variable AD

We define a computation graph as a DAG

  • Root nodes are the parameters (and inputs).
  • Branch nodes are computed values (𝛼 values).
  • Leaf node is the function value.

Two stages (example of a function that takes and and calculates a ):

  1. Forward AD pass is called forward propagation:
    • Computes from and and passes it to its outputs
    • Storing intermediate calculations and
  2. Backward AD pass is called backpropagation:
    • Starts from the end with which is just 1
    • Computes and from
      • Using intermediate calculations stored during forward pass