Hierarchical Clustering
A clustering method that builds nested clusters by merging or splitting them successively
Hierarchical clustering is an unsupervised learning algorithm that creates a hierarchy of clusters, building a tree-like structure (dendrogram) to represent the arrangement of clusters. Unlike K-means, hierarchical clustering doesn't require specifying the number of clusters in advance and provides a complete hierarchy from which any number of clusters can be selected.
Two Main Approaches
Agglomerative (Bottom-Up)
Starts with each data point as a separate cluster and merges the closest pairs of clusters until only one cluster remains. This is the more common approach.
- Start with N clusters (each data point)
- Compute distance between all pairs of clusters
- Merge the closest pair of clusters
- Repeat steps 2-3 until all points are in a single cluster
Divisive (Top-Down)
Starts with all data points in one cluster and recursively splits the clusters until each data point forms its own cluster.
- Start with one cluster containing all points
- Find the best division of the cluster
- Split the cluster into two
- Repeat steps 2-3 until each point is its own cluster
Key Concepts
Dendrogram
A tree diagram that shows the hierarchical relationship between objects. It's used to visualize the results of hierarchical clustering, with the height of the branches indicating the distance between clusters.
Linkage Criteria
Methods for determining the distance between clusters, which affect how clusters are merged:
- Single: Minimum distance between any two points
- Complete: Maximum distance between any two points
- Average: Average distance between all pairs of points
- Ward: Minimize variance within clusters
Distance Metrics
Measurements used to calculate similarities between data points:
- Euclidean: Straight-line distance between points
- Manhattan: Sum of absolute differences
- Cosine: Angle between vectors
- Correlation: Measures linear relationships
Cutting the Dendrogram
The process of selecting a specific number of clusters by cutting the dendrogram at a certain height, allowing flexibility in choosing the appropriate level of clustering.
Advantages and Limitations
Advantages
- No need to specify number of clusters in advance
- Results are deterministic (unlike K-means)
- Produces a dendrogram for visualization
- Can uncover hierarchical relationships
- Works well with various cluster shapes
Limitations
- Computationally intensive (O(n³) for naive implementation)
- Memory intensive for large datasets
- Sensitive to outliers
- Irreversible decisions (once clusters are merged/split)
- Different linkage criteria can produce different results