Hierarchical Clustering

A clustering method that builds nested clusters by merging or splitting them successively

Previous: K-Means Clustering Next: Principal Component Analysis

What is Hierarchical Clustering?

A clustering technique that creates a hierarchy of clusters

Hierarchical clustering is an unsupervised learning algorithm that creates a hierarchy of clusters, building a tree-like structure (dendrogram) to represent the arrangement of clusters. Unlike K-means, hierarchical clustering doesn't require specifying the number of clusters in advance and provides a complete hierarchy from which any number of clusters can be selected.

Two Main Approaches

Agglomerative (Bottom-Up)

Starts with each data point as a separate cluster and merges the closest pairs of clusters until only one cluster remains. This is the more common approach.

Start with N clusters (each data point)
Compute distance between all pairs of clusters
Merge the closest pair of clusters
Repeat steps 2-3 until all points are in a single cluster

Divisive (Top-Down)

Starts with all data points in one cluster and recursively splits the clusters until each data point forms its own cluster.

Start with one cluster containing all points
Find the best division of the cluster
Split the cluster into two
Repeat steps 2-3 until each point is its own cluster

Key Concepts

Dendrogram

A tree diagram that shows the hierarchical relationship between objects. It's used to visualize the results of hierarchical clustering, with the height of the branches indicating the distance between clusters.

Linkage Criteria

Methods for determining the distance between clusters, which affect how clusters are merged:

Single: Minimum distance between any two points
Complete: Maximum distance between any two points
Average: Average distance between all pairs of points
Ward: Minimize variance within clusters

Distance Metrics

Measurements used to calculate similarities between data points:

Euclidean: Straight-line distance between points
Manhattan: Sum of absolute differences
Cosine: Angle between vectors
Correlation: Measures linear relationships

Cutting the Dendrogram

The process of selecting a specific number of clusters by cutting the dendrogram at a certain height, allowing flexibility in choosing the appropriate level of clustering.

Advantages and Limitations

Advantages

No need to specify number of clusters in advance
Results are deterministic (unlike K-means)
Produces a dendrogram for visualization
Can uncover hierarchical relationships
Works well with various cluster shapes

Limitations

Computationally intensive (O(n³) for naive implementation)
Memory intensive for large datasets
Sensitive to outliers
Irreversible decisions (once clusters are merged/split)
Different linkage criteria can produce different results