Hierarchical clustering produces a tree of clusterings

  • Each node in the tree splits the data into 2 or more clusters.
  • Much more information than using a fixed clustering.
  • Often have individual data points as leaves.

Often applied in phylogenetics

Agglomerative Clustering (Bottom up)

Requires a “distance” measure between two clusters.

Cluster distance measures

  1. Distance between closest members of and . Also called single-link clustering:

  2. Distance between farthest members of and . Also called complete-link clustering:

  3. Average distance between members of and . Also called group average clustering:

  4. Starts with each point in its own cluster.

  5. Each step merges the two “closest” clusters.

  6. Stop with one big cluster that has all points.

Naive implementation cost is

Divisive Clustering (Top-down)

Start with all examples in one cluster, then start dividing. (e.g., run K-means on a cluster, then run again on resulting clusters)


Cluster the training examples and features. Helps to figure out the ‘why’ on why things are clustered together

  • Run clustering method on
  • Run clustering method on

A dendrogram describes the hierarchy of clusters generated by the clustering methods.