Hierarchical Clustering
Hierarchical clustering produces a tree of clusterings
 Each node in the tree splits the data into 2 or more clusters.
 Much more information than using a fixed clustering.
 Often have individual data points as leaves.
Often applied in phylogenetics
# Agglomerative Clustering (Bottom up)
Requires a “distance” measure between two clusters.
Cluster distance measures

Distance between closest members of $C_1$ and $C_2$. Also called singlelink clustering: $\min d(a,b), a \in C_1, b \in C_2$

Distance between farthest members of $C_1$ and $C_2$. Also called completelink clustering: $\max d(a,b), a \in C_1, b \in C_2$

Average distance between members of $C_1$ and $C_2$. Also called group average clustering: $\frac{1}{C_1C_2} \sum_{a \in C_1} \sum_{b \in C_2} d(a,b)$

Starts with each point in its own cluster.

Each step merges the two “closest” clusters.

Stop with one big cluster that has all points.
Naive implementation cost is $O(n^3d)$
# Divisive Clustering (Topdown)
Start with all examples in one cluster, then start dividing. (e.g., run Kmeans on a cluster, then run again on resulting clusters)
# Biclustering
Cluster the training examples and features. Helps to figure out the ‘why’ on why things are clustered together
 Run clustering method on $X$
 Run clustering method on $X^T$
A dendrogram describes the hierarchy of clusters generated by the clustering methods.