146. char::MAX_LEN_UTF8 — Size UTF-8 Buffers Without Magic Numbers
Every time you’ve called char::encode_utf8, you’ve written [0u8; 4] from memory. Rust 1.93 stabilises char::MAX_LEN_UTF8 so you don’t have to keep that magic number in your head.
The magic number you keep typing
encode_utf8 writes the UTF-8 bytes of a char into a &mut [u8] and returns a &mut str pointing at the written portion. The slice has to be big enough — which means knowing that the worst-case UTF-8 encoding is 4 bytes:
| |
That 4 is correct but unexplained. Anyone reading your code has to either trust you or go re-derive the UTF-8 spec.
The named version
Rust 1.93 stabilises two constants on char:
| |
MAX_LEN_UTF8 is the maximum number of u8s encode_utf8 can ever write. MAX_LEN_UTF16 is the same for encode_utf16 (a surrogate pair = 2 u16s). Drop them straight into your buffer declarations:
| |
Same behaviour, but the intent is self-documenting — the buffer is sized to hold exactly one char, by definition.
Sizing a buffer for N chars
Where this really pays off is when you’re computing a buffer for several chars on the stack:
| |
Now if Unicode ever expanded its scalar value range and MAX_LEN_UTF8 grew, your code would still be correct. With a hardcoded 4, you’d have a silent buffer overflow waiting to happen the day someone bumps the constant.
Why bother?
It’s a small change — one constant, no new behaviour. But it kills a real source of off-by-one bugs (people writing [0u8; 3] because they “only handle Latin-1”) and makes UTF-8 buffer code legible at a glance. Available since Rust 1.93 (January 2026).