Hierarchical Clustering
Hierarchical clustering produces a tree of clusterings
- Each node in the tree splits the data into 2 or more clusters.
- Much more information than using a fixed clustering.
- Often have individual data points as leaves.
Often applied in phylogenetics
# Agglomerative Clustering (Bottom up)
Requires a “distance” measure between two clusters.
Cluster distance measures
-
Distance between closest members of $C_1$ and $C_2$. Also called single-link clustering: $\min d(a,b), a \in C_1, b \in C_2$
-
Distance between farthest members of $C_1$ and $C_2$. Also called complete-link clustering: $\max d(a,b), a \in C_1, b \in C_2$
-
Average distance between members of $C_1$ and $C_2$. Also called group average clustering: $\frac{1}{|C_1||C_2|} \sum_{a \in C_1} \sum_{b \in C_2} d(a,b)$
-
Starts with each point in its own cluster.
-
Each step merges the two “closest” clusters.
-
Stop with one big cluster that has all points.
Naive implementation cost is $O(n^3d)$
# Divisive Clustering (Top-down)
Start with all examples in one cluster, then start dividing. (e.g., run K-means on a cluster, then run again on resulting clusters)
# Biclustering
Cluster the training examples and features. Helps to figure out the ‘why’ on why things are clustered together
- Run clustering method on $X$
- Run clustering method on $X^T$
A dendrogram describes the hierarchy of clusters generated by the clustering methods.