Ever tried to truncate a string to a byte limit and got a panic because you sliced in the middle of a multi-byte character? floor_char_boundary fixes that.
The Problem
Slicing a string at an arbitrary byte index panics if that index lands inside a multi-byte UTF-8 character:
1
2
3
4
5
6
| let s = "Héllo 🦀 world";
// This panics at runtime!
// let truncated = &s[..5]; // 'é' spans bytes 1..3, index 5 is fine here
// but what if we don't know the content?
let s = "🦀🦀🦀"; // each crab is 4 bytes
// &s[..5] would panic — byte 5 is inside the second crab!
|
You could scan backward byte-by-byte checking is_char_boundary(), but that’s tedious and easy to get wrong.
The Fix: floor_char_boundary
str::floor_char_boundary(index) returns the largest byte position at or before index that sits on a valid character boundary. Its counterpart ceil_char_boundary gives you the smallest position at or after the index.
1
2
3
4
5
6
7
8
9
10
11
12
13
| fn main() {
let s = "🦀🦀🦀"; // each 🦀 is 4 bytes, total 12 bytes
// We want ~6 bytes, but byte 6 is inside the second crab
let i = s.floor_char_boundary(6);
assert_eq!(i, 4); // rounds down to end of first 🦀
assert_eq!(&s[..i], "🦀");
// ceil_char_boundary rounds up instead
let j = s.ceil_char_boundary(6);
assert_eq!(j, 8); // rounds up to end of second 🦀
assert_eq!(&s[..j], "🦀🦀");
}
|
Real-World Use: Safe Truncation
Here’s a practical helper that truncates a string to fit a byte budget, adding an ellipsis if it was shortened:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| fn truncate(s: &str, max_bytes: usize) -> String {
if s.len() <= max_bytes {
return s.to_string();
}
let end = s.floor_char_boundary(max_bytes.saturating_sub(3));
format!("{}...", &s[..end])
}
fn main() {
let bio = "I love Rust 🦀 and crabs!";
let short = truncate(bio, 16);
assert_eq!(short, "I love Rust 🦀...");
// 'I love Rust 🦀' = 15 bytes + '...' = 18 total
// Safe! No panics, no broken characters.
// Short strings pass through unchanged
assert_eq!(truncate("hi", 10), "hi");
}
|
No more manual boundary scanning — these two methods handle the UTF-8 dance for you.