Search IconIcon to open search

Binary Classification

Last updated Oct 28, 2022 Edit Source

Least squares error may overpenalize. Only thing we care about is the sign, not how far away it is from the decision boundary.

Could we instead minimize number of classification errors? This is called the 0-1 loss function: you either get the classification wrong (1) or right (0).

$$\mathcal L(i,j) = \begin{cases} 0 & i = j \\ 1 & i \neq j \end{cases} $$

Illustration above is if $y_i = 1$. Flip for $y_i = -1$

Unfortunately, 0-1 Loss is non-convex. We can, once again, use a convex approximation which is called the Hinge loss:

$$\mathcal L(i,j) = \max(0, 1 - y_iw^Tx_i)$$

See also: SVM

This is an upper bound on the 0-1 loss (as illustrated by the picture). For example, if the hinge loss is 18.3, then the number of training errors is at most 18.

Similarly, we can use the log-sum-exp trick to get the logistic loss which is convex and differentiable.

$$\mathcal L(i,j) \approx \log(1 + \exp(-y_iw^Tx_i))$$

# Perceptron

Only works for linearly-separable data