Maximizes $w^∈gmax_{w}{P(w∣D)}$

Given our data, what is the model $w$ is the best model?

This is connected to MLE through Bayes’ Rule:

$P(w∣D)=P(D)P(D∣w)P(w) ∝P(D∣w)P(w)$

Intuitively, $P(w)$ is accounting for how ‘likely’ this model is. We can also treat this as a regularizer.

$w^∈gwmax {P(w∣D)} ≡gwmax {i=1∏n P(D_{i}∣w)P(w)}≡gwmin {−i=1∑n g(P(D_{i}∣w))−g(P(w))} $

Where $−g(P(w))$ acts like the regularizing term. In fact, many regularizers are equivalent to negative log-priors.

## Relation between regularized loss functions

### L2-Regularized Least Squares

If we assume a Gaussian likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing $f(w)=21 ∥Xw−y∥_{2}+2λ ∥w∥_{2}$

### L2-Regularized Robust Regression

If we assume a Laplace likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing $f(w)=∥Xw−y∥_{1}+2λ ∥w∥_{2}$