Maximum a Posteriori (MAP) Estimation

Maximizes $\overset{w}{^} \in ar g max_{w} {P (w ∣ D)}$

Given our data, which model $w$ is the best model?

This is connected to MLE through Bayes’ Rule:

$P (w ∣ D) = \frac{P ( D ∣ w ) P ( w )}{P ( D )} \propto P (D ∣ w) P (w)$

Intuitively, $P (w)$ is accounting for how ‘likely’ this model is. We can also treat this as a regularizer.

\overset{w}{^} \in ar g w max {P (w ∣ D)} \equiv ar g w max {i = 1 \prod n P (D_{i} ∣ w) P (w)} \equiv ar g w min {- i = 1 \sum n lo g (P (D_{i} ∣ w)) - lo g (P (w))}

Where $- lo g (P (w))$ acts like the regularizing term. In fact, many regularizers are equivalent to negative log-priors.

Relation between regularized loss functions

L2-Regularized Least Squares

If we assume a Gaussian likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing $f (w) = \frac{1}{2} ∥ Xw - y ∥^{2} + \frac{λ}{2} ∥ w ∥^{2}$

L2-Regularized Robust Regression

If we assume a Laplace likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing $f (w) = ∥ Xw - y ∥_{1} + \frac{λ}{2} ∥ w ∥^{2}$

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control