#055 Apr 1, 2026

55. floor_char_boundary — Truncate Strings Without Breaking UTF-8

Ever tried to truncate a string to a byte limit and got a panic because you sliced in the middle of a multi-byte character? floor_char_boundary fixes that.

The Problem

Slicing a string at an arbitrary byte index panics if that index lands inside a multi-byte UTF-8 character:

1
2
3
4
5
6
let s = "Héllo 🦀 world";
// This panics at runtime!
// let truncated = &s[..5]; // 'é' spans bytes 1..3, index 5 is fine here
// but what if we don't know the content?
let s = "🦀🦀🦀"; // each crab is 4 bytes
// &s[..5] would panic — byte 5 is inside the second crab!

You could scan backward byte-by-byte checking is_char_boundary(), but that’s tedious and easy to get wrong.

The Fix: `floor_char_boundary`

str::floor_char_boundary(index) returns the largest byte position at or before index that sits on a valid character boundary. Its counterpart ceil_char_boundary gives you the smallest position at or after the index.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
fn main() {
    let s = "🦀🦀🦀"; // each 🦀 is 4 bytes, total 12 bytes

    // We want ~6 bytes, but byte 6 is inside the second crab
    let i = s.floor_char_boundary(6);
    assert_eq!(i, 4); // rounds down to end of first 🦀
    assert_eq!(&s[..i], "🦀");

    // ceil_char_boundary rounds up instead
    let j = s.ceil_char_boundary(6);
    assert_eq!(j, 8); // rounds up to end of second 🦀
    assert_eq!(&s[..j], "🦀🦀");
}

Real-World Use: Safe Truncation

Here’s a practical helper that truncates a string to fit a byte budget, adding an ellipsis if it was shortened:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
fn truncate(s: &str, max_bytes: usize) -> String {
    if s.len() <= max_bytes {
        return s.to_string();
    }
    let end = s.floor_char_boundary(max_bytes.saturating_sub(3));
    format!("{}...", &s[..end])
}

fn main() {
    let bio = "I love Rust 🦀 and crabs!";
    let short = truncate(bio, 16);
    assert_eq!(short, "I love Rust 🦀...");
    // 'I love Rust 🦀' = 15 bytes + '...' = 18 total
    // Safe! No panics, no broken characters.

    // Short strings pass through unchanged
    assert_eq!(truncate("hi", 10), "hi");
}

No more manual boundary scanning — these two methods handle the UTF-8 dance for you.

This post is licensed under CC BY 4.0 by the author.

55. floor_char_boundary — Truncate Strings Without Breaking UTF-8

The Problem

The Fix: floor_char_boundary

Real-World Use: Safe Truncation

The Fix: `floor_char_boundary`