A Cell<T> lets a single thread mutate through &self — get/set instead of &mut. The atomic types in std::sync::atomic are the same shape, just Sync: a counter, flag, or pointer many threads can poke at without a Mutex, no lock acquisition, no guard, no panic on contention.
The pain: Mutex<u64> for a single counter
A request counter shared across worker threads is the textbook reach-for-Arc<Mutex<_>> case — and the textbook overkill:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| use std::sync::{Arc, Mutex};
use std::thread;
let hits = Arc::new(Mutex::new(0u64));
let mut handles = Vec::new();
for _ in 0..8 {
let h = Arc::clone(&hits);
handles.push(thread::spawn(move || {
for _ in 0..1000 {
let mut g = h.lock().unwrap(); // lock, increment, unlock — 1000 times
*g += 1;
}
}));
}
for h in handles { h.join().unwrap(); }
assert_eq!(*hits.lock().unwrap(), 8_000);
|
Eight threads contending on a lock for an n += 1 is a lot of ceremony to add one to an integer. The CPU has a single instruction for this. Rust exposes it.
The fix: AtomicU64 (or AtomicUsize, AtomicBool, …)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| use std::sync::Arc;
use std::sync::atomic::{AtomicU64, Ordering};
use std::thread;
let hits = Arc::new(AtomicU64::new(0));
let mut handles = Vec::new();
for _ in 0..8 {
let h = Arc::clone(&hits);
handles.push(thread::spawn(move || {
for _ in 0..1000 {
h.fetch_add(1, Ordering::Relaxed); // one instruction, no lock
}
}));
}
for h in handles { h.join().unwrap(); }
assert_eq!(hits.load(Ordering::Relaxed), 8_000);
|
No lock(), no guard, no unwrap. fetch_add is a single read-modify-write — on x86 it’s literally lock xadd. The Arc is still there because the threads need shared ownership, but the interior is lock-free.
The API is just Cell’s API, with orderings
Every atomic has the same small surface:
1
2
3
4
5
6
7
8
9
10
11
12
13
| use std::sync::atomic::{AtomicUsize, Ordering};
let n = AtomicUsize::new(7);
// like Cell::get / Cell::set
let v = n.load(Ordering::Relaxed); assert_eq!(v, 7);
n.store(42, Ordering::Relaxed);
assert_eq!(n.load(Ordering::Relaxed), 42);
// like Cell::replace
let old = n.swap(100, Ordering::Relaxed);
assert_eq!(old, 42);
assert_eq!(n.load(Ordering::Relaxed), 100);
|
Notice what’s missing: there is no &mut T anywhere. You never borrow the inside. You read out a copy or write one in. That’s why this works across threads at all — there’s nothing to alias.
Read-modify-write: the real reason atomics exist
The fetch_* family is where atomics earn their keep. Each is a single uninterruptible round-trip:
1
2
3
4
5
6
7
8
9
10
11
| use std::sync::atomic::{AtomicI32, Ordering};
let n = AtomicI32::new(10);
assert_eq!(n.fetch_add(5, Ordering::Relaxed), 10); // returns old
assert_eq!(n.load(Ordering::Relaxed), 15);
assert_eq!(n.fetch_sub(3, Ordering::Relaxed), 15);
assert_eq!(n.fetch_or(0b1000, Ordering::Relaxed), 12);
assert_eq!(n.fetch_and(0b1100, Ordering::Relaxed), 0b1100);
assert_eq!(n.load(Ordering::Relaxed), 0b1100);
|
fetch_add, fetch_sub, fetch_or, fetch_and, fetch_xor, fetch_min, fetch_max — each one returns the value before the operation. That “before” is what makes them composable: you know exactly which thread did the increment that took you from 999 to 1000.
For anything more complex than a single op (clamp, toggle a state machine, transform), reach for update instead of hand-rolling a compare_exchange loop.
AtomicBool: the flag that doesn’t need a Mutex
The most common “I just want one bit” case:
1
2
3
4
5
6
7
8
9
10
11
12
| use std::sync::atomic::{AtomicBool, Ordering};
let stop = AtomicBool::new(false);
// thread A
stop.store(true, Ordering::Release);
// thread B's hot loop
if stop.load(Ordering::Acquire) {
// shut down
}
# assert!(stop.load(Ordering::Acquire));
|
Release on the writer + Acquire on the reader pairs everything written before the store with everything read after the load — the standard cancellation-flag pattern. Relaxed would be fine if stop is the only thing the two threads share; use Acquire/Release when the flag is gating other writes.
std::sync::atomic ships an atomic for every primitive size:
| Type | Notes |
|---|
AtomicBool | Lock-free flags |
AtomicU8 / U16 / U32 / U64 / Usize | Unsigned counters, bitmasks |
AtomicI8 / I16 / I32 / I64 / Isize | Signed deltas |
AtomicPtr<T> | Raw *mut T, for hand-rolled lock-free structures |
Not every target supports every width lock-free (32-bit ARM lacks 64-bit CAS, for example). cfg(target_has_atomic = "64") lets you gate code that requires it. On modern x86_64 and aarch64, all of the above are lock-free.
What you give up vs Mutex<T>
Atomics work only on values the CPU already knows how to swap in one instruction. The moment you need to atomically update two fields together — a counter and a timestamp, say — you’re back to Mutex<T>. There is no AtomicStruct. You can’t fetch_push a Vec.
The other thing you give up is loud failure. A Mutex poisoned by a panic returns an Err; a deadlock blocks forever and shows up in a stack dump. An atomic happily does the wrong thing forever if you pick the wrong Ordering — the bug manifests as a flaky test under heavy load on a weakly-ordered CPU, and not at all on your laptop. Use SeqCst when in doubt; reach for Relaxed/Acquire/Release only when you can name what’s being synchronized with what.
When to reach for atomics
Counters, flags, generation numbers, fetch_add-style ID allocators, the “is this initialized yet” bit. Anything where the value fits in a register and the only operation is read / write / one-shot RMW.
Anything fatter — a config map, a parsed AST, a connection pool — wants a Mutex<T> or RwLock<T> wrapped in an Arc. And for the “compute once, then read forever” case across threads, there’s a purpose-built tool — that’s this afternoon’s bite.