Given our data, what is the model is the best model?
Intuitively, is accounting for how ‘likely’ this model is. We can also treat this as a regularizer.
Where acts like the regularizing term. In fact, many regularizers are equivalent to negative log-priors.
Relation between regularized loss functions
L2-Regularized Least Squares
If we assume a Gaussian likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing
L2-Regularized Robust Regression
If we assume a Laplace likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing