Search

# Binary Classification

Last updated Oct 28, 2022 Edit Source

• Set $y_i = +1$ for one class (“important”)
• Set $y_i = -1$ for the other class (“not important”)
• To predict, we look at whether $w^Tx_i$ is closer to +1 or -1
• $\hat y_i = \textrm{sign} (w^Tx_i)$

Least squares error may overpenalize. Only thing we care about is the sign, not how far away it is from the decision boundary.

Could we instead minimize number of classification errors? This is called the 0-1 loss function: you either get the classification wrong (1) or right (0).

$$\mathcal L(i,j) = \begin{cases} 0 & i = j \\ 1 & i \neq j \end{cases}$$ Illustration above is if $y_i = 1$. Flip for $y_i = -1$

Unfortunately, 0-1 Loss is non-convex. We can, once again, use a convex approximation which is called the Hinge loss:

$$\mathcal L(i,j) = \max(0, 1 - y_iw^Tx_i)$$

This is an upper bound on the 0-1 loss (as illustrated by the picture). For example, if the hinge loss is 18.3, then the number of training errors is at most 18.

Similarly, we can use the log-sum-exp trick to get the logistic loss which is convex and differentiable.

$$\mathcal L(i,j) \approx \log(1 + \exp(-y_iw^Tx_i))$$

### # Perceptron

Only works for linearly-separable data

• Searches for a $w$ such that $\textrm{sign}(w^Tx_i ) = y_i, \forall i$
• Intuition is that you search for the ledge
• Start with $w^0 = 0$
• Classify each example until we reach a mistake
• Then, update $w$ to $w^{t+1} = w^t + y_ix_i$
• If a perfect classifier exists, this algorithm finds one in finite number of steps