143. Vec::dedup_by_key — Collapse Consecutive Duplicates by a Derived Key
Vec::dedup() only collapses runs that are exactly equal. When you care about a derived attribute — the minute on a timestamp, the domain in an email, the first letter of a word — reach for dedup_by_key.
The plain dedup() is strict: two adjacent elements are only merged if == says so.
| |
But often “duplicate” really means “shares some property with its neighbour.” dedup_by_key takes a closure that maps each element to a key, then keeps the first of every consecutive run whose keys match:
| |
1, 3, 5 all have key 1 → keep 1. Then 2, 4 have key 0 → keep 2. Then 7 has key 1 → keep it. Then 6, 8 have key 0 → keep 6.
The practical case: log lines already in time order, and you want one representative per minute.
| |
Two things worth remembering. First, it only looks at adjacent pairs — if you need full uniqueness, pair it with sort_by_key first. Second, if your equivalence isn’t expressible as a key (e.g. “values within 0.1 of each other”), there’s a sibling dedup_by that takes |a, b| -> bool directly. All three are in-place, allocation-free, and run in linear time.