Hierarchical Clustering

Hierarchical clustering produces a tree of clusterings

Often applied in phylogenetics

Agglomerative Clustering (Bottom up)

Requires a “distance” measure between two clusters.

Cluster distance measures

Distance between closest members of $C_{1}$ and $C_{2}$ . Also called single-link clustering: $min d (a, b), a \in C_{1}, b \in C_{2}$
Distance between farthest members of $C_{1}$ and $C_{2}$ . Also called complete-link clustering: $max d (a, b), a \in C_{1}, b \in C_{2}$
Average distance between members of $C_{1}$ and $C_{2}$ . Also called group average clustering: $\frac{1}{∣ C _{1} ∣∣ C _{2} ∣} \sum_{a \in C_{1}} \sum_{b \in C_{2}} d (a, b)$
Starts with each point in its own cluster.
Each step merges the two “closest” clusters.
Stop with one big cluster that has all points.

Naive implementation cost is $O (n^{3} d)$

Start with all examples in one cluster, then start dividing. (e.g., run K-means on a cluster, then run again on resulting clusters)

Cluster the training examples and features. Helps to figure out the ‘why’ on why things are clustered together

A dendrogram describes the hierarchy of clusters generated by the clustering methods.