Probabilistic Classifier
We want a model of $P(y_i = \textrm{important} | x_i )$ for use in decision theory.
- Predictions generally map $w^Tx_i$ to labels for classes (for binary prediction, we used $\textrm{sign}(x)$)
- Probabilities we want to map $w^Tx_i$ to the range $[0,1]$
The most common choice is to use the sigmoid function:
$$h(z_i) = \frac{1}{1+\exp(-z_i)}$$
# Multi-class Probabilities
See also: multi-class classification
The softmax function allows us to map $k$ real numbers $z_i = w_c^Tx_i$ to probabilities
$$P(y | z_1, z_2, \dots, z_k) = \frac{\exp(z_y)}{\sum_{c=1}^k \exp(z_c))}$$