Performance

197. Advance a State Machine with mem::replace — Move the Enum Out, No Clone

Transitioning an enum state behind &mut self looks impossible: you can’t move the old variant’s owned data into the new one without the borrow checker stopping you — so people reach for .clone(). mem::replace lets you move the whole state out, leaving a cheap placeholder behind.

This closes out the performance week. Earlier bites covered mem::take, mem::replace, and mem::swap as primitives. Here’s the pattern they were built for: a state machine that moves owned data forward through its transitions.

The setup

A job that walks Queued → Running → Done, carrying an owned String payload from one state into the next:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#[derive(Debug, PartialEq)]
enum Stage {
    Queued { payload: String },
    Running { payload: String, worker: u32 },
    Done { result: String },
}

struct Job {
    stage: Stage,
}

The trap: matching a borrow forces a clone

You only have &mut self, so the obvious move is to match on &self.stage. But that gives you a borrow of payload — to put it in the next state you have to clone it:

1
2
3
4
5
6
7
8
9
fn advance(&mut self) {
    self.stage = match &self.stage {
        Stage::Queued { payload } => Stage::Running {
            payload: payload.clone(), // borrowed, so clone to reuse
            worker: 7,
        },
        // ...
    };
}

Matching on self.stage by value would move out of &mut self — the borrow checker rejects it outright. So clone feels like the only way out. It isn’t.

The fix: replace the whole state, then match by value

mem::replace swaps in a cheap placeholder and hands you the real state by value. Now the match owns payload and can move it straight into the next variant — zero clones:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
use std::mem;

impl Job {
    fn advance(&mut self) {
        self.stage = match mem::replace(&mut self.stage, Stage::Done { result: String::new() }) {
            Stage::Queued { payload } => Stage::Running { payload, worker: 7 },
            Stage::Running { payload, worker } => {
                Stage::Done { result: format!("{payload}@{worker}") }
            }
            done => done, // terminal state stays put
        };
    }
}

The placeholder (Done { result: String::new() }) is free — an empty String allocates nothing — and it lives for only the instant before you overwrite self.stage with the match result.

1
2
3
4
5
6
7
let mut job = Job { stage: Stage::Queued { payload: "build".into() } };

job.advance();
assert_eq!(job.stage, Stage::Running { payload: "build".into(), worker: 7 });

job.advance();
assert_eq!(job.stage, Stage::Done { result: "build@7".into() });

The payload string is allocated once and threaded through all three states by pointer. No copy of the bytes ever happens — exactly what the clone-based version threw away on every transition.

196. Return impl Iterator, Not Vec — Let the Caller Decide What to Do

Returning a Vec from a helper allocates eagerly, every time — even when the caller only wants the first match or a running sum. Return impl Iterator instead and the allocation simply never happens unless the caller asks for it.

This is the function-boundary version of yesterday’s bite-195: chaining adapters avoids temporary Vecs inside a pipeline; returning impl Iterator avoids forcing one across a function call.

The eager version

A helper that builds and returns a Vec commits to a heap allocation and a full pass over the data before the caller has said what they want:

1
2
3
fn evens_doubled(nums: &[i32]) -> Vec<i32> {
    nums.iter().filter(|&&n| n % 2 == 0).map(|&n| n * 2).collect()
}

If the caller just wants the first result, they still pay for the whole Vec:

1
2
3
let data = [1, 2, 3, 4, 5, 6];
let first = evens_doubled(&data).into_iter().next(); // allocated all 3, used 1
assert_eq!(first, Some(4));

Hand back the iterator instead

Drop the .collect() and return the lazy iterator. The + '_ ties its lifetime to the borrowed slice:

1
2
3
fn evens_doubled(nums: &[i32]) -> impl Iterator<Item = i32> + '_ {
    nums.iter().filter(|&&n| n % 2 == 0).map(|&n| n * 2)
}

Now nothing runs until the caller pulls values through — and they pick the consumer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let data = [1, 2, 3, 4, 5, 6];

let v: Vec<i32> = evens_doubled(&data).collect(); // collect if you want to
assert_eq!(v, [4, 8, 12]);

let total: i32 = evens_doubled(&data).sum();       // or fold straight to a number
assert_eq!(total, 24);

let first_big = evens_doubled(&data).find(|&n| n > 5); // or short-circuit
assert_eq!(first_big, Some(8));                    // stops at 8, never doubles 6

The find call never allocates and never touches the last element. The Vec-returning version couldn’t do that — collect() always drains the whole thing first.

The one rule: don’t borrow a local

The iterator you return can borrow your parameters, but not data you created inside the function — that data is dropped when the function ends. Iterators over owned values (like a Range) carry no borrow, so they just work:

1
2
3
4
5
6
fn squares(n: u32) -> impl Iterator<Item = u32> {
    (1..=n).map(|x| x * x)
}

let sq: Vec<u32> = squares(4).collect();
assert_eq!(sq, [1, 4, 9, 16]);

If you must produce owned data inside the function and stream it out, move it into the iterator (e.g. vec.into_iter() or a move closure) rather than returning a borrow of a local.

195. Chain Iterator Adapters — Don't collect() Between Every Step

Every collect::<Vec<_>>() in the middle of a pipeline is a heap allocation and a full pass over your data. Adapters like map and filter are lazy and fuse together — chain them and the whole transformation runs in one pass with zero temporary Vecs.

A Vec between every step

It’s tempting to do one transformation at a time, binding each result to a variable. Every collect() allocates a throwaway Vec and walks the entire sequence before the next step even starts:

1
2
3
4
5
6
7
let nums = [1, 2, 3, 4, 5, 6];

let doubled: Vec<i32> = nums.iter().map(|&n| n * 2).collect();
let big: Vec<i32> = doubled.into_iter().filter(|&n| n % 4 == 0).collect();
let sum: i32 = big.iter().sum();

assert_eq!(sum, 24);

Two intermediate Vecs, two extra allocations, three separate passes — all to compute a single number.

One chain, one pass, no temporaries

The adapters compose directly. Nothing is materialized until the final consumer (sum) pulls values through, so there are no intermediate collections at all:

1
2
3
4
5
6
7
8
9
let nums = [1, 2, 3, 4, 5, 6];

let sum: i32 = nums
    .iter()
    .map(|&n| n * 2)        // 2, 4, 6, 8, 10, 12
    .filter(|&n| n % 4 == 0) // 4, 8, 12
    .sum();                  // 24

assert_eq!(sum, 24);

Each element flows through map then filter then into the sum, one at a time. No buffer is ever allocated.

Laziness means short-circuiting works

Because nothing runs until pulled, a chain only does the work it needs. Add a take(2) and the pipeline stops after producing two results — the elements past that point are never touched:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
use std::cell::Cell;

let visited = Cell::new(0);
let nums = [1, 2, 3, 4, 5, 6, 7, 8];

let first_two: Vec<i32> = nums
    .iter()
    .inspect(|_| visited.set(visited.get() + 1))
    .map(|&n| n * 10)
    .filter(|&n| n > 20)
    .take(2)
    .collect();

assert_eq!(first_two, [30, 40]);
assert_eq!(visited.get(), 4); // stopped early — never looked at 5..8

The intermediate-collect version can’t do this: collect() always drains the whole iterator, so it would have visited all eight elements before take ever saw one.

When you genuinely do need a Vec

The point isn’t “never collect” — it’s “don’t collect between steps.” Collect once, at the end, when you actually need an owned, reusable collection:

1
2
3
4
5
6
7
8
9
let words = ["fast", "lazy", "fused", "iter"];

let shouted: Vec<String> = words
    .iter()
    .filter(|w| w.len() == 4)
    .map(|w| w.to_uppercase())
    .collect();

assert_eq!(shouted, ["FAST", "LAZY", "ITER"]);

One collect, at the end, when the Vec is the actual result. Everything before it stays lazy and allocation-free.

194. Reuse One Buffer with .clear() — Allocate Once, Loop Many Times

with_capacity (bite 193) buys a buffer once instead of growing it repeatedly. But if you allocate a fresh String or Vec inside a loop, you throw that buffer away every iteration. .clear() resets the length to zero while keeping the capacity — so one allocation serves the whole loop.

A fresh allocation every iteration

It’s easy to declare the working buffer inside the loop. Each pass allocates a new heap buffer and drops it at the end of the iteration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
let lines = ["alpha", "beta", "gamma"];
let mut out = Vec::new();

for line in lines {
    let mut buf = String::new();   // new heap allocation, every iteration
    buf.push_str(line);
    buf.make_ascii_uppercase();
    out.push(buf.clone());
}

assert_eq!(out, ["ALPHA", "BETA", "GAMMA"]);

Three iterations, three allocate-then-free cycles for the scratch buffer. Scale that to a million lines and it’s a million wasted allocations.

.clear() keeps the capacity

Hoist the buffer out of the loop and clear() it at the top of each pass. clear() sets the length to 0 but leaves the allocated capacity in place, so after the first iteration the buffer is already big enough and never reallocates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
let lines = ["alpha", "beta", "gamma"];
let mut out = Vec::new();
let mut buf = String::new();       // allocated once

for line in lines {
    buf.clear();                   // len -> 0, capacity untouched
    buf.push_str(line);
    buf.make_ascii_uppercase();
    out.push(buf.clone());
}

assert_eq!(out, ["ALPHA", "BETA", "GAMMA"]);

The contract is the whole point — clear drops the contents but not the buffer:

1
2
3
4
5
6
7
let mut s = String::with_capacity(64);
s.push_str("hello");
let cap = s.capacity();

s.clear();
assert_eq!(s.len(), 0);            // empty again
assert_eq!(s.capacity(), cap);     // ...but the buffer is still there

The read-into-a-reused-buffer pattern

This shows up constantly when reading input. BufRead::read_line appends to the buffer you give it, so the idiomatic loop clears one String each pass instead of allocating a new one per line:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
use std::io::BufRead;

let input = "12\n34\n56\n";
let mut reader = std::io::BufReader::new(input.as_bytes());

let mut line = String::new();      // one buffer for every line
let mut sum = 0i64;

loop {
    line.clear();                  // required — read_line appends
    let n = reader.read_line(&mut line).unwrap();
    if n == 0 {
        break;                     // 0 bytes read == EOF
    }
    sum += line.trim().parse::<i64>().unwrap();
}

assert_eq!(sum, 102);

The same trick works for any scratch Vecclear() it at the top of the loop and reuse the capacity for the next batch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let mut scratch: Vec<u8> = Vec::new();
let mut total = 0;

for chunk in [&[1u8, 2, 3][..], &[4, 5], &[6]] {
    scratch.clear();
    scratch.extend_from_slice(chunk);
    total += scratch.iter().map(|&b| b as u32).sum::<u32>();
}

assert_eq!(total, 21);

Reach for a fresh Vec/String only when you actually need to keep each result. When the buffer is just scratch space, allocate it once, clear() it, and let the loop run free.

193. Vec::with_capacity — Size Up Front, Skip the Realloc Churn

A Vec you push into one element at a time doesn’t grow one element at a time — it doubles, copying every existing item to a fresh allocation each time it outgrows its buffer. If you already know how many items are coming, Vec::with_capacity buys the whole buffer once.

The hidden cost of push

Vec::new() starts with zero capacity. As you push, it reallocates geometrically — roughly doubling each time — and every reallocation copies all existing elements into the new, larger buffer. Fill a vector with 1000 items and you pay for that copying around ten times over:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
let mut v = Vec::new();
let mut reallocs = 0;
let mut last_cap = v.capacity();

for i in 0..1000 {
    v.push(i);
    if v.capacity() != last_cap {
        reallocs += 1;        // the buffer just moved
        last_cap = v.capacity();
    }
}
// ~10 reallocations, each copying everything built so far
assert!(reallocs >= 8);

In a hot loop, that churn is pure waste: allocate, copy, free, allocate bigger, copy again.

Reserve the space once

If you know the final size, hand it to Vec::with_capacity. The buffer is allocated a single time, and push never has to move it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
let mut v = Vec::with_capacity(1000);
let mut reallocs = 0;
let mut last_cap = v.capacity();

for i in 0..1000 {
    v.push(i);
    if v.capacity() != last_cap {
        reallocs += 1;
        last_cap = v.capacity();
    }
}
assert_eq!(reallocs, 0);       // zero — the buffer never moved

Capacity is not length: with_capacity(1000) gives you room for 1000 items but the vector is still empty (len() == 0) until you push.

Already have a Vec? Use reserve

When the vector exists and you’re about to add a known number of elements, reserve grows the buffer ahead of time without touching the contents:

1
2
3
4
5
let mut v = vec![1, 2, 3];
v.reserve(100);               // ensure room for 100 *more*

assert!(v.capacity() >= 103);
assert_eq!(v.len(), 3);       // still 3 elements — only capacity changed

Use reserve before a batch of pushes; use reserve_exact when you want the buffer sized precisely, with no geometric slack.

collect often does this for you

Iterators expose a size_hint, so collecting from a sized iterator already reserves the right capacity — no manual call needed:

1
2
let squares: Vec<i32> = (0..1000).map(|x| x * x).collect();
assert_eq!(squares.len(), 1000);

The win is biggest exactly where it matters: tight loops building large vectors. If you can name the size, name it once and let push run free.

192. impl Into<String> — Take Owned or Borrowed Without an Extra Allocation

Bite 191 said: if you only read the argument, take &str. But what if you need to store it? Taking &str and calling .to_owned() always allocates — even when the caller handed you a String it was about to throw away. impl Into<String> fixes that.

The hidden re-allocation

When a function keeps the value, the “take &str” rule turns into a trap:

1
2
3
4
5
struct Label { text: String }

fn make_label(text: &str) -> Label {
    Label { text: text.to_owned() } // always allocates
}

A literal caller has to allocate eventually — fair enough. But look what happens when the caller already owns a String:

1
2
3
4
let owned = String::from("Status: OK");
let label = make_label(&owned);
// `owned` is copied into a brand-new allocation, then `owned` is dropped.
// We threw away a perfectly good String and allocated a second one.

The caller had an owned buffer it no longer needed, and we ignored it.

Accept anything that becomes a String

Take impl Into<String>. A String moves in with zero copying; a &str allocates exactly once — never more:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
struct Label { text: String }

fn make_label(text: impl Into<String>) -> Label {
    Label { text: text.into() }
}

// Literal: one allocation, unavoidable since we store it.
let a = make_label("Status: OK");

// Owned String: MOVED in. No copy, no second allocation.
let owned = String::from("Status: OK");
let b = make_label(owned);

assert_eq!(a.text, "Status: OK");
assert_eq!(b.text, "Status: OK");

Same call site for both, and the owned case is now free. The conversion happens lazily at the boundary, exactly once, and only when it must.

When you only read: impl AsRef

If you don’t store the value but still want to accept more than deref coercion allows (String, &str, Box<str>, Cow<str>, …), reach for impl AsRef<str>:

1
2
3
4
5
6
7
fn shout(text: impl AsRef<str>) -> String {
    text.as_ref().to_uppercase()
}

assert_eq!(shout("hi"), "HI");                       // &str
assert_eq!(shout(String::from("hi")), "HI");          // String
assert_eq!(shout(Box::<str>::from("hi")), "HI");      // Box<str>

as_ref() is a cheap borrow — no allocation — and the generic accepts every string-like type without forcing the caller to convert first.

The rule of thumb

If the function stores the string, take impl Into<String> so an owned argument moves in for free. If it only reads but you want maximum flexibility, take impl AsRef<str>. Plain &str (bite 191) is still the right default for simple read-only functions — these two just cover the cases it can’t.

191. Accept &str, Not String — Take the Most General Borrow

A function that takes String forces every caller holding a &str to allocate just to call you. Take &str instead — and &[T] over &Vec<T> — and deref coercion lets everyone in for free.

The over-specific signature

This function only ever reads its argument, yet it demands an owned String:

1
2
3
fn greeting(name: String) -> String {
    format!("Hello, {name}!")
}

Now a caller with a string literal — the most common case — has to allocate a whole String just to satisfy the type:

1
let g = greeting("Ferris".to_string()); // pointless allocation

Worse, a caller who only has a borrow (say, a field of someone else’s struct) is stuck: they must .clone() or .to_owned() before they can call you, even though you never keep the value.

Take the borrow

If the body only reads, take &str. Deref coercion means &String and string literals both coerce to &str automatically:

1
2
3
4
5
6
7
fn greeting(name: &str) -> String {
    format!("Hello, {name}!")
}

let owned = String::from("Ferris");
assert_eq!(greeting("Ferris"), "Hello, Ferris!"); // literal, no alloc
assert_eq!(greeting(&owned), "Hello, Ferris!");    // &String coerces to &str

Zero allocations at the call site, and every kind of caller just works.

Same rule for slices

The exact parallel exists for vectors. Taking &Vec<T> locks callers into owning a Vec; taking &[T] accepts a Vec, an array, or any slice:

1
2
3
4
5
6
7
8
9
fn total(nums: &[i32]) -> i32 {
    nums.iter().sum()
}

let v = vec![1, 2, 3];
let a = [4, 5, 6];
assert_eq!(total(&v), 6);       // &Vec<i32> coerces to &[i32]
assert_eq!(total(&a), 15);      // array coerces too
assert_eq!(total(&v[1..]), 5);  // a sub-slice — impossible with &Vec

&Vec<T> couldn’t accept that array or that sub-slice at all. &[T] is strictly more flexible and costs nothing.

The general principle

Borrow the least specific type that still does the job: &str over &String, &[T] over &Vec<T>, &Path over &PathBuf. Owned types in arguments are for when the function actually needs to store the value. If it only reads, hand it the borrow — the caller keeps their allocation, and your function works with more types for free.

#190 Jun 2026

190. Return Cow<str> — Allocate Only When You Actually Change Something

An escaping or normalizing function usually has nothing to do — the input is already clean. Returning String forces an allocation anyway. Return Cow<str> and the common path stays a borrow.

The wasteful version

A function that escapes HTML returns String, so every caller pays for an allocation — even the overwhelming majority whose input contains nothing to escape:

1
2
3
4
5
6
fn escape_html(input: &str) -> String {
    input
        .replace('&', "&amp;")
        .replace('<', "&lt;")
        .replace('>', "&gt;")
}

"hello world" has no special characters, yet replace still walks the string three times and hands back a fresh String. In a template renderer or a parser running this over thousands of fields, that’s thousands of pointless heap allocations.

Borrow on the fast path

Cow<str> lets one return type be either a borrow or an owned String. Check first; only allocate when there’s real work:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
use std::borrow::Cow;

fn escape_html(input: &str) -> Cow<str> {
    // Fast path: nothing to escape, hand back the original borrow.
    if !input.contains(['&', '<', '>']) {
        return Cow::Borrowed(input);
    }

    // Slow path: build the escaped String exactly once.
    let mut out = String::with_capacity(input.len());
    for c in input.chars() {
        match c {
            '&' => out.push_str("&amp;"),
            '<' => out.push_str("&lt;"),
            '>' => out.push_str("&gt;"),
            _ => out.push(c),
        }
    }
    Cow::Owned(out)
}

The clean input never touches the heap; the dirty input allocates once instead of three times:

1
2
3
4
5
6
let clean = escape_html("hello world");
assert!(matches!(clean, Cow::Borrowed(_))); // zero allocation

let dirty = escape_html("a < b & c");
assert!(matches!(dirty, Cow::Owned(_)));
assert_eq!(dirty, "a &lt; b &amp; c");

Callers don’t notice

Cow<str> derefs to &str, so anything that reads the result just works — no .unwrap(), no matching:

1
2
3
4
5
fn render(field: &str) -> usize {
    escape_html(field).len() // Cow derefs to &str
}

assert_eq!(render("plain"), 5);

And when a caller genuinely needs ownership, .into_owned() allocates only if it’s still borrowed:

1
2
let owned: String = escape_html("safe").into_owned();
assert_eq!(owned, "safe");

The rule: any function that might return its input unchanged — escaping, trimming, normalizing, path canonicalization — should return Cow<str>, not String. The signature tells the caller “I’ll borrow when I can,” and the body only reaches for the heap on the path that earns it.

163. Cow::to_mut — Lazy In-Place Mutation Through Cow

Cow<str> is the type everyone reaches for when a function might need to modify its input. Cow::Borrowed and Cow::Owned are the constructors that get the spotlight; to_mut is the third piece, and it’s the one that actually pays off the laziness.

What to_mut does

to_mut takes &mut Cow<str> and hands back &mut String:

  • If the Cow is already Owned, you get a direct &mut to the inner String.
  • If it’s Borrowed, to_mut clones the slice into a fresh String, swaps the Cow over to Owned, and then hands you the mutable reference.

That asymmetry is the whole point. Many callers borrow and never touch to_mut — they never allocate. The ones that do call it pay the allocation cost exactly once, on first write.

A walking-the-string example

Expand \t into two spaces, but only allocate if the input actually contains a tab:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
use std::borrow::Cow;

fn expand_tabs(s: &str) -> Cow<'_, str> {
    let mut out: Cow<'_, str> = Cow::Borrowed(s);
    if let Some(i) = s.find('\t') {
        // First write — `to_mut` clones the slice into a String, then we
        // rebuild from byte `i` onwards.
        let buf = out.to_mut();
        buf.truncate(i);
        for c in s[i..].chars() {
            if c == '\t' {
                buf.push_str("  ");
            } else {
                buf.push(c);
            }
        }
    }
    out
}

The happy path — input has no tab — never enters the if, never allocates, and returns the original slice wrapped in Cow::Borrowed. The unhappy path allocates exactly once.

Composing transformations

to_mut really earns its keep when you chain several optional mutations. The first one that fires flips the Cow to Owned; every following mutation sees an already-owned buffer and reuses it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
use std::borrow::Cow;

fn apply_rules<'a>(s: &'a str, rules: &[(char, &str)]) -> Cow<'a, str> {
    let mut out: Cow<'a, str> = Cow::Borrowed(s);
    for &(from, to) in rules {
        if out.contains(from) {
            let replaced = out.replace(from, to);
            *out.to_mut() = replaced;
        }
    }
    out
}

Three things worth pointing at. First, out.contains(from) works because Cow<str> derefs to str. Second, the assignment *out.to_mut() = replaced replaces the inner String, not the Cow itself. Third, once the first rule fires, all subsequent to_mut calls are a no-op &mut String — no extra clones.

Pitfall: to_mut always commits

There’s no “preview, then maybe commit” mode. Calling to_mut on a borrowed Cow clones immediately, even if you never end up writing through the returned reference. So this is a trap:

1
2
3
4
if !out.is_empty() {
    let _ = out.to_mut();  // allocates even though we may not change anything
    // ... maybe mutate, maybe not
}

Guard the call with the actual condition that means “I’m about to write,” not the condition that means “I might.” The mental shortcut: to_mut is the moment you trade your &str for a String. Reach for it lazily, but commit completely.

#104 Apr 2026

104. Vec::swap_remove — O(1) Removal When Order Doesn't Matter

Calling Vec::remove(i) shifts every element after i left by one — that’s O(n) per call. If you don’t care about order, swap_remove does the same job in O(1).

The hidden cost of Vec::remove

remove takes the value out of the vector, but to keep the rest in order it has to shift every following element left by one slot. On a long vector, removing near the front is a giant memcpy:

1
2
3
4
5
let mut v = vec![1, 2, 3, 4, 5];
let removed = v.remove(1);

assert_eq!(removed, 2);
assert_eq!(v, [1, 3, 4, 5]);

Inside a loop — “remove every item that matches X” — that O(n) cost compounds into O(n²). A tight loop that should be linear silently becomes quadratic.

swap_remove: trade order for speed

swap_remove(i) removes the element at index i by swapping it with the last element, then popping the tail. No shifting, no memcpy chain — just one swap and a pop:

1
2
3
4
5
6
let mut v = vec![1, 2, 3, 4, 5];
let removed = v.swap_remove(1);

assert_eq!(removed, 2);
// 5 took 2's spot — order is gone, but the op was O(1).
assert_eq!(v, [1, 5, 3, 4]);

If your Vec is really a set — entities in a game world, pending jobs in a worker pool, items keyed by id elsewhere — order rarely matters. swap_remove is free.

Removing many items in place

The classic mistake: iterate forward calling remove. swap_remove lets you do it without the O(n²) blow-up. Walk by index and don’t bump the cursor when you remove:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
let mut nums = vec![1, 2, 3, 4, 5, 6, 7, 8];

let mut i = 0;
while i < nums.len() {
    if nums[i] % 2 == 0 {
        nums.swap_remove(i);
        // Don't increment — a new element is now at position i.
    } else {
        i += 1;
    }
}

nums.sort(); // canonicalize for the assert
assert_eq!(nums, [1, 3, 5, 7]);

For bulk predicate-based filtering, Vec::retain stays cleaner (and is also O(n)). Reach for swap_remove when you have a specific index you want gone in O(1).

With structs

swap_remove returns the removed value, so you can hand it off to another collection on the way out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#[derive(PartialEq, Debug)]
struct Job { id: u32 }

let mut queue = vec![
    Job { id: 1 },
    Job { id: 2 },
    Job { id: 3 },
    Job { id: 4 },
];

let pulled = queue.swap_remove(0);

assert_eq!(pulled, Job { id: 1 });
assert_eq!(queue.len(), 3);

When to reach for it

Use swap_remove whenever you’d write remove(i) and the surrounding code doesn’t depend on element order. Order-preserving removal still needs remove, but most “remove one item from this Vec” code is order-agnostic — and swap_remove turns an O(n) pothole into a one-line O(1) win. Stable since Rust 1.0, one of those quietly-essential APIs that’s easy to forget exists.

98. sort_by_cached_key — Stop Recomputing Expensive Sort Keys

sort_by_key sounds like it computes the key once per element. It doesn’t — it calls your closure at every comparison, so an n-element sort can pay for that key O(n log n) times. If the key is expensive, sort_by_cached_key is the fix you’ve been looking for.

The trap

The signature reads nicely: “sort by this key.” The implementation, less so — the closure fires on every comparison, not once per element:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
use std::cell::Cell;

let mut items = vec!["banana", "fig", "apple", "cherry", "date"];
let calls = Cell::new(0);

items.sort_by_key(|s| {
    calls.set(calls.get() + 1);
    // Pretend this is a heavy computation: allocating, hashing,
    // parsing, calling a regex, opening a file, etc.
    s.to_string()
});

// 5 elements, but the key ran way more than 5 times.
assert!(calls.get() > items.len());
assert_eq!(items, ["apple", "banana", "cherry", "date", "fig"]);

For identity keys that cost a pointer-deref, nobody cares. For anything that allocates.to_string(), .to_lowercase(), format!(...), a regex capture, a trimmed-and-lowered filename — the cost compounds quickly. I’ve seen a profile where 80% of total runtime was the key closure being called 40,000 times to sort 2,000 items.

The fix

slice::sort_by_cached_key runs your closure exactly once per element, stashes the results in a scratch buffer, then sorts against the cache. This is the Schwartzian transform, wrapped up in a method call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
use std::cell::Cell;

let mut items = vec!["banana", "fig", "apple", "cherry", "date"];
let calls = Cell::new(0);

items.sort_by_cached_key(|s| {
    calls.set(calls.get() + 1);
    s.to_string()
});

// Exactly one call per element — no matter how big the slice is.
assert_eq!(calls.get(), items.len());
assert_eq!(items, ["apple", "banana", "cherry", "date", "fig"]);

Same result, linear key-function calls. The memory trade is a Vec<(K, usize)> the size of the slice — cheap next to the cost of re-running an allocating closure on every compare.

When to reach for which

The rule is about where your time goes, not how fancy the key looks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let mut nums = vec![5u32, 2, 8, 1, 9, 3];

// Trivial key: sort_by_key is fine (and avoids the scratch alloc).
nums.sort_by_key(|n| *n);
assert_eq!(nums, [1, 2, 3, 5, 8, 9]);

// Expensive key: sort_by_cached_key wins.
let mut files = vec!["Cargo.TOML", "src/MAIN.rs", "README.md", "build.RS"];
files.sort_by_cached_key(|path| path.to_lowercase());
assert_eq!(files, ["build.RS", "Cargo.TOML", "README.md", "src/MAIN.rs"]);

Use sort_by_key for cheap, Copy-ish keys. Use sort_by_cached_key the moment your closure allocates, hashes, parses, or otherwise does real work — it’s the difference between O(n log n) and O(n) calls to that closure.

90. black_box — Stop the Compiler From Erasing Your Benchmarks

Your benchmark ran in 0 nanoseconds? Congratulations — the compiler optimised away the code you were trying to measure. std::hint::black_box prevents that by hiding values from the optimiser.

The problem: the optimiser is too smart

The Rust compiler aggressively eliminates dead code. If it can prove a result is never used, it simply removes the computation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
fn sum_range(n: u64) -> u64 {
    (0..n).sum()
}

fn main() {
    let start = std::time::Instant::now();
    let result = sum_range(10_000_000);
    let elapsed = start.elapsed();

    // Without using `result`, the compiler may skip the entire computation
    println!("took: {elapsed:?}");

    // Force the result to be "used" so the above isn't optimised out
    assert!(result > 0);
}

In release mode, the compiler can see through this and may still optimise the loop away — or even compute the answer at compile time. Your benchmark reports near-zero time, and you learn nothing.

Enter black_box

std::hint::black_box takes a value and returns it unchanged, but the compiler treats it as an opaque barrier — it can’t see through it, so it can’t optimise away whatever produced that value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
use std::hint::black_box;

fn sum_range(n: u64) -> u64 {
    (0..n).sum()
}

fn main() {
    let start = std::time::Instant::now();
    let result = sum_range(black_box(10_000_000));
    let _ = black_box(result);
    let elapsed = start.elapsed();

    println!("sum = {result}, took: {elapsed:?}");
}

Two black_box calls do the trick:

  1. Wrap the input — prevents the compiler from constant-folding the argument
  2. Wrap the output — prevents dead-code elimination of the computation

Before and after

Without black_box (release mode):

1
sum = 49999995000000, took: 83ns     ← suspiciously fast

With black_box (release mode):

1
sum = 49999995000000, took: 5.612ms  ← actual work

It works on any type

black_box is generic — it works on integers, strings, structs, references, whatever:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
use std::hint::black_box;

fn main() {
    // Hide a vector from the optimiser
    let data: Vec<i32> = black_box(vec![1, 2, 3, 4, 5]);
    let total: i32 = data.iter().sum();
    let total = black_box(total);

    assert_eq!(total, 15);
}

Micro-benchmark recipe

Here’s a minimal pattern for quick-and-dirty micro-benchmarks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
use std::hint::black_box;
use std::time::Instant;

fn fibonacci(n: u32) -> u64 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn main() {
    let iterations = 100;
    let start = Instant::now();
    for _ in 0..iterations {
        black_box(fibonacci(black_box(30)));
    }
    let elapsed = start.elapsed();
    println!("{iterations} runs in {elapsed:?} ({:?}/iter)", elapsed / iterations);
}

Without black_box, the compiler could hoist the pure function call out of the loop or eliminate it entirely. With it, each iteration does real work.

When to use it

Reach for black_box whenever you’re timing code and the results look suspiciously fast. It’s also the foundation that benchmarking frameworks like criterion and the built-in #[bench] use under the hood.

It’s not a full benchmarking harness — for serious measurement you still want warmup, statistics, and outlier detection. But when you need a quick sanity check, black_box + Instant gets the job done.

Available since Rust 1.66 on stable.

89. cold_path — Tell the Compiler Which Branch Won't Happen

Your error-handling branch fires once in a million calls, but the compiler doesn’t know that. core::hint::cold_path lets you mark unlikely branches so the optimiser can focus on the hot path.

Why branch layout matters

Modern CPUs predict which way a branch will go and speculatively execute instructions ahead of time. When the prediction is right, execution flies. When it’s wrong, the pipeline stalls.

Compilers already try to guess which branches are hot, but they don’t always get it right — especially when both sides of an if look equally plausible from a static analysis perspective. That’s where cold_path comes in.

The basics

Call cold_path() at the start of a branch that is rarely taken:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
use std::hint::cold_path;

fn process(value: Option<u64>) -> u64 {
    if let Some(v) = value {
        v * 2
    } else {
        cold_path();
        log_miss();
        0
    }
}

fn log_miss() {
    // imagine logging or metrics here
}

The compiler now knows the else arm is unlikely. It can lay out the machine code so the hot path (the Some arm) has no jumps, keeping it in the instruction cache and the branch predictor happy.

In match expressions

cold_path works well in match arms too — mark the rare variants:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
use std::hint::cold_path;

fn handle_status(code: u16) -> &'static str {
    match code {
        200 => "ok",
        301 => "moved",
        404 => { cold_path(); "not found" }
        500 => { cold_path(); "server error" }
        _   => { cold_path(); "unknown" }
    }
}

Only the branches you expect to be common stay on the fast track.

Building likely and unlikely helpers

If you’ve used C/C++, you might miss __builtin_expect. With cold_path you can build the same thing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
use std::hint::cold_path;

#[inline(always)]
const fn likely(b: bool) -> bool {
    if !b { cold_path(); }
    b
}

#[inline(always)]
const fn unlikely(b: bool) -> bool {
    if b { cold_path(); }
    b
}

fn check_bounds(index: usize, len: usize) -> bool {
    if unlikely(index >= len) {
        panic!("out of bounds: {} >= {}", index, len);
    }
    true
}

Now you can annotate conditions directly instead of marking individual branches.

A word of caution

Misusing cold_path on a branch that actually runs often can hurt performance — the compiler will deprioritise it, and you’ll get more pipeline stalls, not fewer. Always benchmark before sprinkling hints around. Profile first, hint second.

The bottom line

cold_path is a zero-cost, zero-argument function that tells the optimiser what you already know: this branch is the exception, not the rule. It’s a small tool, but in hot loops and latency-sensitive code, it can make a measurable difference.

Stabilised in Rust 1.95 (April 2026).

76. slice::as_chunks — Split Slices into Fixed-Size Arrays

You’re calling .chunks(4) and immediately doing chunk.try_into().unwrap() to get an array. as_chunks gives you &[[T; N]] directly — a slice of properly typed arrays, plus the remainder.

The Problem

When you use .chunks(N), each chunk is a &[T] — a dynamically sized slice. The compiler doesn’t know its length, so you’re stuck converting manually:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fn sum_pairs(data: &[i32]) -> Vec<i32> {
    data.chunks(2)
        .filter(|c| c.len() == 2) // skip incomplete last chunk
        .map(|c| c[0] + c[1])     // runtime indexing, no guarantees
        .collect()
}

fn main() {
    let values = [1, 2, 3, 4, 5];
    assert_eq!(sum_pairs(&values), vec![3, 7]);
}

That works, but the compiler can’t verify your index access at compile time. You’re also throwing away the last chunk if it’s incomplete, with no easy way to inspect it.

After: as_chunks

Stabilized in Rust 1.88, as_chunks splits a slice into a &[[T; N]] and a remainder &[T] in one call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
fn sum_pairs(data: &[i32]) -> Vec<i32> {
    let (chunks, _remainder) = data.as_chunks::<2>();
    chunks.iter().map(|[a, b]| a + b).collect()
}

fn main() {
    let values = [1, 2, 3, 4, 5];
    assert_eq!(sum_pairs(&values), vec![3, 7]);

    // The remainder is available too
    let (chunks, remainder) = values.as_chunks::<2>();
    assert_eq!(chunks, &[[1, 2], [3, 4]]);
    assert_eq!(remainder, &[5]);
}

Each chunk is &[i32; 2], so you can pattern-match [a, b] directly. The compiler knows the size — no bounds checks, no panics.

Processing Fixed-Width Records

Parsing binary data with fixed-width fields is where as_chunks shines. Imagine RGB pixel data packed as bytes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
fn brighten(pixels: &mut [u8], amount: u8) {
    let (chunks, _) = pixels.as_chunks_mut::<3>();
    for [r, g, b] in chunks {
        *r = r.saturating_add(amount);
        *g = g.saturating_add(amount);
        *b = b.saturating_add(amount);
    }
}

fn main() {
    let mut pixels = [100, 150, 200, 50, 60, 70, 255, 128, 0];
    brighten(&mut pixels, 30);
    assert_eq!(pixels, [130, 180, 230, 80, 90, 100, 255, 158, 30]);
}

No manual stride arithmetic. Each iteration gives you exactly 3 bytes, pattern-matched into r, g, b.

Don’t Lose the Remainder

Unlike chunks_exact() where you call .remainder() on the iterator after consuming it, as_chunks returns the remainder upfront:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn main() {
    let data = [10, 20, 30, 40, 50, 60, 70];

    let (fours, rest) = data.as_chunks::<4>();
    assert_eq!(fours.len(), 1);       // one complete chunk: [10, 20, 30, 40]
    assert_eq!(fours[0], [10, 20, 30, 40]);
    assert_eq!(rest, &[50, 60, 70]);  // leftover elements

    // as_rchunks starts from the right instead
    let (rest, fours) = data.as_rchunks::<4>();
    assert_eq!(rest, &[10, 20, 30]);
    assert_eq!(fours[0], [40, 50, 60, 70]);
}

as_rchunks is the mirror — it aligns chunks to the end, putting the remainder at the front. Useful when your trailing data is the structured part (e.g., a checksum or footer).

The Full Family

All stabilized in Rust 1.88, these come in immutable and mutable variants:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
fn main() {
    let data: &[u8] = &[1, 2, 3, 4, 5, 6, 7];

    // as_chunks — align from left, remainder on right
    let (chunks, rem) = data.as_chunks::<3>();
    assert_eq!(chunks, &[[1, 2, 3], [4, 5, 6]]);
    assert_eq!(rem, &[7]);

    // as_rchunks — align from right, remainder on left
    let (rem, chunks) = data.as_rchunks::<3>();
    assert_eq!(rem, &[1]);
    assert_eq!(chunks, &[[2, 3, 4], [5, 6, 7]]);

    // Mutable versions: as_chunks_mut, as_rchunks_mut
    let mut buf = [0u8; 6];
    let (chunks, _) = buf.as_chunks_mut::<2>();
    chunks[0] = [0xCA, 0xFE];
    chunks[1] = [0xBA, 0xBE];
    chunks[2] = [0xDE, 0xAD];
    assert_eq!(buf, [0xCA, 0xFE, 0xBA, 0xBE, 0xDE, 0xAD]);
}

Whenever you’re reaching for .chunks(N) with a compile-time constant, as_chunks::<N>() gives you stronger types, better ergonomics, and the remainder without gymnastics.

75. select_nth_unstable — Find the Kth Element Without Sorting

You’re sorting an entire Vec just to grab the median or the top 3 elements. select_nth_unstable does it in O(n) — no full sort required.

The classic approach: sort everything, then index:

1
2
3
4
let mut scores = vec![82, 45, 99, 67, 73, 91, 55];
scores.sort();
let median = scores[scores.len() / 2];
assert_eq!(median, 73);

That’s O(n log n) to answer an O(n) question. select_nth_unstable uses a partial-sort algorithm (quickselect) to put the element at a given index into its final sorted position — everything before it is smaller or equal, everything after is greater or equal:

1
2
3
4
let mut scores = vec![82, 45, 99, 67, 73, 91, 55];
let mid = scores.len() / 2;
scores.select_nth_unstable(mid);
assert_eq!(scores[mid], 73);

The method returns three mutable slices — elements below, the pivot element, and elements above — so you can work with each partition directly:

1
2
3
4
5
6
7
let mut data = vec![10, 80, 30, 90, 50, 70, 20];
let (lower, median, upper) = data.select_nth_unstable(3);
// lower contains 3 elements all <= *median
// upper contains 3 elements all >= *median
assert_eq!(*median, 50);
assert!(lower.iter().all(|&x| x <= 50));
assert!(upper.iter().all(|&x| x >= 50));

Need the top 3 scores without sorting the full list? Select the boundary, then sort only the small tail:

1
2
3
4
5
6
let mut scores = vec![82, 45, 99, 67, 73, 91, 55];
let k = scores.len() - 3;
scores.select_nth_unstable(k);
let top_3 = &mut scores[k..];
top_3.sort_unstable(); // sort only 3 elements, not all 7
assert_eq!(top_3, &[82, 91, 99]);

Custom ordering works too. Find the median by absolute value:

1
2
3
4
let mut vals = vec![-20, 5, -10, 15, -3, 8, 1];
let mid = vals.len() / 2;
vals.select_nth_unstable_by_key(mid, |v| v.abs());
assert_eq!(vals[mid].abs(), 8);

The “unstable” in the name means equal elements might be reordered (like sort_unstable) — it doesn’t mean the API is experimental. This is stable since Rust 1.49, available on any [T] where T: Ord.

36. Cow<str> — Clone on Write

Stop cloning strings “just in case” — Cow<str> lets you borrow when you can and clone only when you must.

The problem

You’re writing a function that sometimes needs to modify a string and sometimes doesn’t. The easy fix? Clone every time:

1
2
3
4
5
6
7
fn ensure_greeting(name: &str) -> String {
    if name.starts_with("Hello") {
        name.to_string() // unnecessary clone!
    } else {
        format!("Hello, {name}!")
    }
}

This works, but that first branch allocates a brand-new String even though name is already perfect as-is. In a hot loop, those wasted allocations add up.

Enter Cow<str>

Cow stands for Clone on Write. It holds either a borrowed reference or an owned value, and only clones when you actually need to mutate or take ownership:

1
2
3
4
5
6
7
8
9
use std::borrow::Cow;

fn ensure_greeting(name: &str) -> Cow<str> {
    if name.starts_with("Hello") {
        Cow::Borrowed(name) // zero-cost: just wraps the reference
    } else {
        Cow::Owned(format!("Hello, {name}!"))
    }
}

Now the happy path (name already starts with “Hello”) does zero allocation. The caller gets a Cow<str> that derefs to &str transparently — most code won’t even notice the difference.

Using Cow values

Because Cow<str> implements Deref<Target = str>, you can use it anywhere a &str is expected:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
use std::borrow::Cow;

fn ensure_greeting(name: &str) -> Cow<str> {
    if name.starts_with("Hello") {
        Cow::Borrowed(name)
    } else {
        Cow::Owned(format!("Hello, {name}!"))
    }
}

fn main() {
    let greeting = ensure_greeting("Hello, world!");
    assert_eq!(&*greeting, "Hello, world!");

    // Call &str methods directly on Cow
    assert!(greeting.contains("world"));

    // Only clone into String when you truly need ownership
    let _owned: String = greeting.into_owned();

    let greeting2 = ensure_greeting("Rust");
    assert_eq!(&*greeting2, "Hello, Rust!");
}

When to reach for Cow

Cow shines in these situations:

  • Conditional transformations — functions that modify input only sometimes (normalization, trimming, escaping)
  • Config/lookup values — return a static default or a dynamically built string
  • Parser outputs — most tokens are slices of the input, but some need unescaping

The Cow type works with any ToOwned pair, not just strings. You can use Cow<[u8]>, Cow<Path>, or Cow<[T]> the same way.

Quick reference

OperationCost
Cow::Borrowed(s)Free — wraps a reference
Cow::Owned(s)Whatever creating the owned value costs
*cow (deref)Free
cow.into_owned()Free if already owned, clones if borrowed
cow.to_mut()Clones if borrowed, then gives &mut access