Utf-8

#189 Jun 2026

189. str::char_indices — Slice a String Without Panicking on Non-ASCII

chars().enumerate() hands you a character count, but &s[..] wants a byte offset. Mix them up and one accented letter blows your program apart.

Say you want everything from the underscore onward. The enumerate version looks right and works fine in tests full of ASCII:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
let s = "café_table"; // 'é' is two bytes in UTF-8

let idx = s
    .chars()
    .enumerate()
    .find(|(_, c)| *c == '_')
    .map(|(i, _)| i)
    .unwrap();

let rest = &s[idx..]; // idx == 4 (char count), but '_' starts at byte 5

idx is 4, the character position. Byte 4 lands in the middle of é, so the slice panics: byte index 4 is not a char boundary.

char_indices yields the real byte offset of each character, which is exactly what slicing expects:

1
2
3
4
5
6
7
8
let idx = s
    .char_indices()
    .find(|(_, c)| *c == '_')
    .map(|(i, _)| i)
    .unwrap();

assert_eq!(idx, 5);
assert_eq!(&s[idx..], "_table"); // no panic, correct slice

The pattern is (byte_offset, char) instead of enumerate’s (count, char). It’s also a DoubleEndedIterator, so next_back gives you the last character and where it begins:

1
2
let (last_off, last_ch) = s.char_indices().next_back().unwrap();
assert_eq!((last_off, last_ch), (10, 'e'));

Rule of thumb: the moment a character index touches &s[..], .split_at(), or any byte-indexed API, reach for char_indices — not enumerate.

#055 Apr 2026

strings utf-8 std

55. floor_char_boundary — Truncate Strings Without Breaking UTF-8

Ever tried to truncate a string to a byte limit and got a panic because you sliced in the middle of a multi-byte character? floor_char_boundary fixes that.

The Problem

Slicing a string at an arbitrary byte index panics if that index lands inside a multi-byte UTF-8 character:

1
2
3
4
5
6
let s = "Héllo 🦀 world";
// This panics at runtime!
// let truncated = &s[..5]; // 'é' spans bytes 1..3, index 5 is fine here
// but what if we don't know the content?
let s = "🦀🦀🦀"; // each crab is 4 bytes
// &s[..5] would panic — byte 5 is inside the second crab!

You could scan backward byte-by-byte checking is_char_boundary(), but that’s tedious and easy to get wrong.

The Fix: `floor_char_boundary`

str::floor_char_boundary(index) returns the largest byte position at or before index that sits on a valid character boundary. Its counterpart ceil_char_boundary gives you the smallest position at or after the index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn main() {
    let s = "🦀🦀🦀"; // each 🦀 is 4 bytes, total 12 bytes

    // We want ~6 bytes, but byte 6 is inside the second crab
    let i = s.floor_char_boundary(6);
    assert_eq!(i, 4); // rounds down to end of first 🦀
    assert_eq!(&s[..i], "🦀");

    // ceil_char_boundary rounds up instead
    let j = s.ceil_char_boundary(6);
    assert_eq!(j, 8); // rounds up to end of second 🦀
    assert_eq!(&s[..j], "🦀🦀");
}

Real-World Use: Safe Truncation

Here’s a practical helper that truncates a string to fit a byte budget, adding an ellipsis if it was shortened:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
fn truncate(s: &str, max_bytes: usize) -> String {
    if s.len() <= max_bytes {
        return s.to_string();
    }
    let end = s.floor_char_boundary(max_bytes.saturating_sub(3));
    format!("{}...", &s[..end])
}

fn main() {
    let bio = "I love Rust 🦀 and crabs!";
    let short = truncate(bio, 16);
    assert_eq!(short, "I love Rust 🦀...");
    // 'I love Rust 🦀' = 15 bytes + '...' = 18 total
    // Safe! No panics, no broken characters.

    // Short strings pass through unchanged
    assert_eq!(truncate("hi", 10), "hi");
}

No more manual boundary scanning — these two methods handle the UTF-8 dance for you.