Hierarchical Clustering

A clustering method that builds nested clusters by merging or splitting them successively

What is Hierarchical Clustering?
A clustering technique that creates a hierarchy of clusters

Hierarchical clustering is an unsupervised learning algorithm that creates a hierarchy of clusters, building a tree-like structure (dendrogram) to represent the arrangement of clusters. Unlike K-means, hierarchical clustering doesn't require specifying the number of clusters in advance and provides a complete hierarchy from which any number of clusters can be selected.

Two Main Approaches

Agglomerative (Bottom-Up)

Starts with each data point as a separate cluster and merges the closest pairs of clusters until only one cluster remains. This is the more common approach.

  1. Start with N clusters (each data point)
  2. Compute distance between all pairs of clusters
  3. Merge the closest pair of clusters
  4. Repeat steps 2-3 until all points are in a single cluster

Divisive (Top-Down)

Starts with all data points in one cluster and recursively splits the clusters until each data point forms its own cluster.

  1. Start with one cluster containing all points
  2. Find the best division of the cluster
  3. Split the cluster into two
  4. Repeat steps 2-3 until each point is its own cluster

Key Concepts

Dendrogram

A tree diagram that shows the hierarchical relationship between objects. It's used to visualize the results of hierarchical clustering, with the height of the branches indicating the distance between clusters.

Linkage Criteria

Methods for determining the distance between clusters, which affect how clusters are merged:

  • Single: Minimum distance between any two points
  • Complete: Maximum distance between any two points
  • Average: Average distance between all pairs of points
  • Ward: Minimize variance within clusters

Distance Metrics

Measurements used to calculate similarities between data points:

  • Euclidean: Straight-line distance between points
  • Manhattan: Sum of absolute differences
  • Cosine: Angle between vectors
  • Correlation: Measures linear relationships

Cutting the Dendrogram

The process of selecting a specific number of clusters by cutting the dendrogram at a certain height, allowing flexibility in choosing the appropriate level of clustering.

Advantages and Limitations

Advantages

  • No need to specify number of clusters in advance
  • Results are deterministic (unlike K-means)
  • Produces a dendrogram for visualization
  • Can uncover hierarchical relationships
  • Works well with various cluster shapes

Limitations

  • Computationally intensive (O(n³) for naive implementation)
  • Memory intensive for large datasets
  • Sensitive to outliers
  • Irreversible decisions (once clusters are merged/split)
  • Different linkage criteria can produce different results