Numerics

#208 Jun 2026

208. f64::mul_add — One Rounding, One Instruction, Better Accuracy

a * b + c rounds twice and may compile to two separate operations. a.mul_add(b, c) computes the whole thing at full precision, rounds once, and on a CPU with FMA folds into a single instruction.

Two roundings vs. one

Floating-point math rounds after every operation. Writing a * b + c first rounds the product a * b to the nearest f64, then rounds again after adding c. That intermediate rounding throws away bits before they ever reach the sum:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
let a = 1.0_f64;
let b = 2.0_f64;
let c = 3.0_f64;

// Two roundings: (a*b) rounded, then (+ c) rounded
let separate = a * b + c;

// One rounding: a*b + c evaluated at full precision, rounded once
let fused = a.mul_add(b, c);

assert_eq!(separate, 5.0);
assert_eq!(fused, 5.0);

For tidy values they agree. The difference shows up when the product needs more bits than an f64 can hold and the addition would have recovered them — exactly the catastrophic-cancellation cases that wreck numeric code.

Where it pays off: Horner’s method

Polynomial evaluation is built out of multiply-then-add, so it’s the textbook home for mul_add. To evaluate 2x² + 3x + 4 with Horner’s method you nest the operations, and each step is one fused step:

1
2
3
4
5
6
fn poly(x: f64) -> f64 {
    // ((2*x) + 3)*x + 4
    2.0_f64.mul_add(x, 3.0).mul_add(x, 4.0)
}

assert_eq!(poly(2.0), 18.0); // 2*4 + 3*2 + 4

Each mul_add keeps full precision through the chain instead of rounding at every * and +. The same pattern carries dot products, which are a sum of products:

1
2
3
4
5
6
7
fn dot(a: &[f64], b: &[f64]) -> f64 {
    a.iter()
        .zip(b)
        .fold(0.0, |acc, (&x, &y)| x.mul_add(y, acc))
}

assert_eq!(dot(&[1.0, 2.0, 3.0], &[4.0, 5.0, 6.0]), 32.0);

The honest caveat

mul_add is faster only when the target CPU has a hardware fused-multiply-add instruction (modern x86-64 with FMA, AArch64, most others you’ll deploy to). Where it doesn’t, the standard library has to emulate the exact single-rounding semantics in software, and that emulation is slower than a plain a * b + c. So this is a hot-path tool: reach for it in tight numeric loops on hardware you control (or behind target-feature/target-cpu flags), and lean on it freely when you need the extra accuracy regardless of speed.

The bottom line

mul_add gives you a single rounding step and, on capable hardware, a single instruction for a * b + c. Use it in polynomial evaluation, dot products, and any multiply-accumulate loop where precision or throughput matters — and remember it can regress on FMA-less targets, so measure when speed is the goal.

#073 Apr 2026

numerics overflow algorithms

73. u64::midpoint — Average Two Numbers Without Overflow

Computing the average of two integers sounds trivial — until it overflows. The midpoint method gives you a correct result every time, no wider types required.

The classic binary search bug lurks in this innocent-looking line:

1
let mid = (low + high) / 2;

When low and high are both large, the addition wraps around and you get garbage. This has bitten production code in every language for decades.

The textbook workaround avoids the addition entirely:

1
let mid = low + (high - low) / 2;

This works — but only when low <= high, and it’s one more thing to get wrong under pressure.

Rust’s midpoint method handles all of this for you:

1
2
3
4
5
6
7
8
9
let a: u64 = u64::MAX - 1;
let b: u64 = u64::MAX;

// This would panic in debug or wrap in release:
// let avg = (a + b) / 2;

// Safe and correct:
let avg = a.midpoint(b);
assert_eq!(avg, u64::MAX - 1);

It works on signed integers too, rounding toward zero:

1
2
3
let x: i32 = -3;
let y: i32 = 4;
assert_eq!(x.midpoint(y), 0);  // rounds toward zero, not -∞

And on floats, where it’s computed without intermediate overflow:

1
2
3
let a: f64 = f64::MAX;
let b: f64 = f64::MAX;
assert_eq!(a.midpoint(b), f64::MAX);  // not infinity

Here’s a binary search that actually works for the full u64 range:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
fn binary_search(sorted: &[i32], target: i32) -> Option<usize> {
    let mut low: usize = 0;
    let mut high: usize = sorted.len();
    while low < high {
        let mid = low.midpoint(high);
        match sorted[mid].cmp(&target) {
            std::cmp::Ordering::Less => low = mid + 1,
            std::cmp::Ordering::Greater => high = mid,
            std::cmp::Ordering::Equal => return Some(mid),
        }
    }
    None
}

let data = vec![1, 3, 5, 7, 9, 11];
assert_eq!(binary_search(&data, 7), Some(3));
assert_eq!(binary_search(&data, 4), None);

Available on all integer types (u8 through u128, i8 through i128, usize, isize) and floats (f32, f64). No crate needed — it’s in the standard library.