#157 May 23, 2026

157. Atomic* — The Thread-Safe Cell for Scalars

A Cell<T> lets a single thread mutate through &self — get/set instead of &mut. The atomic types in std::sync::atomic are the same shape, just Sync: a counter, flag, or pointer many threads can poke at without a Mutex, no lock acquisition, no guard, no panic on contention.

The pain: `Mutex<u64>` for a single counter

A request counter shared across worker threads is the textbook reach-for-Arc<Mutex<_>> case — and the textbook overkill:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
use std::sync::{Arc, Mutex};
use std::thread;

let hits = Arc::new(Mutex::new(0u64));
let mut handles = Vec::new();

for _ in 0..8 {
    let h = Arc::clone(&hits);
    handles.push(thread::spawn(move || {
        for _ in 0..1000 {
            let mut g = h.lock().unwrap();   // lock, increment, unlock — 1000 times
            *g += 1;
        }
    }));
}
for h in handles { h.join().unwrap(); }
assert_eq!(*hits.lock().unwrap(), 8_000);

Eight threads contending on a lock for an n += 1 is a lot of ceremony to add one to an integer. The CPU has a single instruction for this. Rust exposes it.

The fix: `AtomicU64` (or `AtomicUsize`, `AtomicBool`, …)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
use std::sync::Arc;
use std::sync::atomic::{AtomicU64, Ordering};
use std::thread;

let hits = Arc::new(AtomicU64::new(0));
let mut handles = Vec::new();

for _ in 0..8 {
    let h = Arc::clone(&hits);
    handles.push(thread::spawn(move || {
        for _ in 0..1000 {
            h.fetch_add(1, Ordering::Relaxed);   // one instruction, no lock
        }
    }));
}
for h in handles { h.join().unwrap(); }
assert_eq!(hits.load(Ordering::Relaxed), 8_000);

No lock(), no guard, no unwrap. fetch_add is a single read-modify-write — on x86 it’s literally lock xadd. The Arc is still there because the threads need shared ownership, but the interior is lock-free.

The API is just `Cell`’s API, with orderings

Every atomic has the same small surface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
use std::sync::atomic::{AtomicUsize, Ordering};

let n = AtomicUsize::new(7);

// like Cell::get / Cell::set
let v = n.load(Ordering::Relaxed);     assert_eq!(v, 7);
n.store(42, Ordering::Relaxed);
assert_eq!(n.load(Ordering::Relaxed), 42);

// like Cell::replace
let old = n.swap(100, Ordering::Relaxed);
assert_eq!(old, 42);
assert_eq!(n.load(Ordering::Relaxed), 100);

Notice what’s missing: there is no &mut T anywhere. You never borrow the inside. You read out a copy or write one in. That’s why this works across threads at all — there’s nothing to alias.

Read-modify-write: the real reason atomics exist

The fetch_* family is where atomics earn their keep. Each is a single uninterruptible round-trip:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
use std::sync::atomic::{AtomicI32, Ordering};

let n = AtomicI32::new(10);

assert_eq!(n.fetch_add(5, Ordering::Relaxed), 10);  // returns old
assert_eq!(n.load(Ordering::Relaxed), 15);

assert_eq!(n.fetch_sub(3, Ordering::Relaxed), 15);
assert_eq!(n.fetch_or(0b1000, Ordering::Relaxed), 12);
assert_eq!(n.fetch_and(0b1100, Ordering::Relaxed), 0b1100);
assert_eq!(n.load(Ordering::Relaxed), 0b1100);

fetch_add, fetch_sub, fetch_or, fetch_and, fetch_xor, fetch_min, fetch_max — each one returns the value before the operation. That “before” is what makes them composable: you know exactly which thread did the increment that took you from 999 to 1000.

For anything more complex than a single op (clamp, toggle a state machine, transform), reach for update instead of hand-rolling a compare_exchange loop.

`AtomicBool`: the flag that doesn’t need a `Mutex`

The most common “I just want one bit” case:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
use std::sync::atomic::{AtomicBool, Ordering};

let stop = AtomicBool::new(false);

// thread A
stop.store(true, Ordering::Release);

// thread B's hot loop
if stop.load(Ordering::Acquire) {
    // shut down
}
# assert!(stop.load(Ordering::Acquire));

Release on the writer + Acquire on the reader pairs everything written before the store with everything read after the load — the standard cancellation-flag pattern. Relaxed would be fine if stop is the only thing the two threads share; use Acquire/Release when the flag is gating other writes.

std::sync::atomic ships an atomic for every primitive size:

Type	Notes
`AtomicBool`	Lock-free flags
`AtomicU8` / `U16` / `U32` / `U64` / `Usize`	Unsigned counters, bitmasks
`AtomicI8` / `I16` / `I32` / `I64` / `Isize`	Signed deltas
`AtomicPtr<T>`	Raw `*mut T`, for hand-rolled lock-free structures

Not every target supports every width lock-free (32-bit ARM lacks 64-bit CAS, for example). cfg(target_has_atomic = "64") lets you gate code that requires it. On modern x86_64 and aarch64, all of the above are lock-free.

What you give up vs `Mutex<T>`

Atomics work only on values the CPU already knows how to swap in one instruction. The moment you need to atomically update two fields together — a counter and a timestamp, say — you’re back to Mutex<T>. There is no AtomicStruct. You can’t fetch_push a Vec.

The other thing you give up is loud failure. A Mutex poisoned by a panic returns an Err; a deadlock blocks forever and shows up in a stack dump. An atomic happily does the wrong thing forever if you pick the wrong Ordering — the bug manifests as a flaky test under heavy load on a weakly-ordered CPU, and not at all on your laptop. Use SeqCst when in doubt; reach for Relaxed/Acquire/Release only when you can name what’s being synchronized with what.

When to reach for atomics

Counters, flags, generation numbers, fetch_add-style ID allocators, the “is this initialized yet” bit. Anything where the value fits in a register and the only operation is read / write / one-shot RMW.

Anything fatter — a config map, a parsed AST, a connection pool — wants a Mutex<T> or RwLock<T> wrapped in an Arc. And for the “compute once, then read forever” case across threads, there’s a purpose-built tool — that’s this afternoon’s bite.

This post is licensed under CC BY 4.0 by the author.

157. Atomic* — The Thread-Safe Cell for Scalars

The pain: Mutex<u64> for a single counter

The fix: AtomicU64 (or AtomicUsize, AtomicBool, …)

The API is just Cell’s API, with orderings