k-Nearest Neighbours (KNN)
To classify an example, we find the $k$ examples closest to the example and take the mode of the $k$ examples.
Works based off of the assumption that similar features are likely to have similar labels
As $k$ grows, training error increases and approximation error decreases.
We measure distance using the “norm” between feature vectors. The most common norm is the L2-Norm or Euclidean Norm
- $O(1)$ training (just relies on training data)
- $O(nd)$ predictions ($O(d)$ distance calculations for all $n$ examples)
- $O(nd)$ space to store each training example in memory
- This is non-parametric
KNN can suck in high dimensions (see: curse of dimensionality)