What Is Union-Find?
Union-Find is a data structure that tracks elements partitioned into disjoint sets (groups that don’t overlap). It supports two operations:
-
Find(x): Which group does element x belong to? (returns the group’s representative/root)
-
Union(x, y): Merge the groups containing x and y
Both operations are nearly O(1) with two optimizations: path compression and union by rank.
The Mental Model
Imagine people at a party forming groups:
-
Initially, everyone is in their own group (standing alone)
-
Union(Alice, Bob): Alice and Bob’s groups merge (they start chatting)
-
Find(Alice): "Who is the 'leader' of Alice’s group?" → returns the group’s representative
The key question Union-Find answers efficiently: "Are these two elements in the same group?"
Implementation
class UnionFind:
def __init__(self, size: int):
# parent[i] = i means i is its own root (initial state)
self.parent = list(range(size))
self.rank = [0] * size
def find(self, x: int) -> int:
"""Find root with PATH COMPRESSION."""
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Flatten the tree
return self.parent[x]
def union(self, x: int, y: int) -> bool:
"""Merge groups. Returns False if already same group."""
root_x, root_y = self.find(x), self.find(y)
if root_x == root_y:
return False # Already connected
# UNION BY RANK: attach shorter tree under taller tree
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
return True
def connected(self, x: int, y: int) -> bool:
"""Check if x and y are in the same group. O(α(n)) ≈ O(1)."""
return self.find(x) == self.find(y)
Why Path Compression + Union by Rank = Nearly O(1)
Path compression: After find(x), every node on the path from x to root becomes a direct child of the root. This flattens the tree, making future finds faster.
Union by rank: Always attaches the shorter tree under the taller tree, preventing the tree from becoming a linked list.
Together, both operations are O(α(n)) amortized, where α is the inverse Ackermann function. This function grows so incredibly slowly that for all practical purposes (n < 10^80, the number of atoms in the universe), α(n) < 5. So it’s effectively O(1).
Complexity
| Operation | Amortized Complexity |
|---|---|
|
O(α(n)) ≈ O(1) |
|
O(α(n)) ≈ O(1) |
|
O(α(n)) ≈ O(1) |
Space |
O(n) |
When to Use Union-Find
-
Count connected components (Number of Islands — Ch 9)
-
Detect cycles in undirected graphs (if find(x) == find(y) before union → cycle!)
-
Kruskal’s MST algorithm (process edges in weight order, union endpoints)
-
Dynamic connectivity ("are x and y connected after adding this edge?")
|
Union-Find vs BFS/DFS for connectivity: - BFS/DFS: Better for one-time traversal or when you need the path - Union-Find: Better for INCREMENTAL connectivity (adding edges over time) and counting components |
Exercises
-
Redundant Connection — find the cycle-causing edge
-
Number of Islands was covered in Chapter 09 using Union-Find