We want a model of $P(y_{i}=important∣x_{i})$ for use in decision theory.

- Predictions generally map $w_{T}x_{i}$ to labels for classes (for binary prediction, we used $sign(x)$)
- Probabilities we want to map $w_{T}x_{i}$ to the range $[0,1]$

The most common choice is to use the sigmoid function:

$h(z_{i})=1+exp(−z_{i})1 $

## Multi-class Probabilities

See also: multi-class classification

The softmax function allows us to map $k$ real numbers $z_{i}=w_{c}x_{i}$ to probabilities

$P(y∣z_{1},z_{2},…,z_{k})=∑_{c=1}exp(z_{c}))exp(z_{y}) $