• Clusters are defined by “dense” regions.
  • Examples in non-dense regions don’t get clustered
  • Clusters can be non-convex

It is non-parametric (there is no fixed number of clusters )

DBSCAN

Main idea: merge all neighbouring core points to form clusters

Defines

  • Epsilon (): distance we use to decide if another point is a “neighbour”
  • MinNeighbours: number of neighbours needed to say a region is “dense”
    • If you have at least minNeighbours “neighbours”, you are called a “core” point

Pseudocode;

  • For each example :
    • If is already assigned to a cluster, do nothing.
    • Test whether is a ‘core’ point ( minNeighbours examples within ).
      • If is not core point, do nothing (this could be an outlier).
      • If is a core point, make a new cluster and call the “expand cluster” function.
        • Assign to this cluster all within distance of core point to this cluster.
        • For each new “core” point found, call “expand cluster” (recursively).

Issues

  • Ambiguity of “non-core” boundary points
  • Sensitive to the choice of and minNeighbours