Home > Net >  Problem splitting regular ASCII symbols in a string
Problem splitting regular ASCII symbols in a string

Time:10-14

Just had this error pop up while messing around with some graphics for a terminal interface...

thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside '░' (bytes 1..4) of ░▒▓█', src/main.rs:38:6

Can I not use these characters, or do I need to work some magic to support what I thought were default ASCII characters?

(Here's the related code for those wondering.)

// Example call with the same parameters that led to this issue.
charlist(" ░▒▓█".to_string(), 0.66);

// Returns the n-th character in a string.
// (Where N is a float value from 0 to 1,
// 0 being the start of the string and 1 the end.)
fn charlist<'a>(chars: &'a String, amount: f64) -> &'a str {
    let chl: f64 = chars.chars().count() as f64;  // Length of the string
    let chpos = -((amount*chl)%chl) as i32;  // Scalar converted to integer position in the string
    &chars[chpos as usize..chpos as usize 1]  // Slice the single requested character
}

CodePudding user response:

There are couple misconceptions you seem to have. So let me address them in order.

  1. , , and are not ASCII characters! They are unicode code points. You can determine this with following simple experiment.
fn main() {
    let slice = " ░▒▓█";
    for c in slice.chars() {
        println!("{}, {}", c, c.len_utf8());
    } 
}

This code has output:

 , 1
░, 3
▒, 3
▓, 3
█, 3

As you can see this "boxy" characters have a length of 3 bytes each! Rust uses utf-8 encoding for all of it's strings. This leads to another misconception.

  1. I this line &chars[chpos as usize..chpos as usize 1] you are trying to get a slice of one byte in length. String slices in rust are indexed with bytes. But you tried to slice in the middle of a character (it has length of 3 bytes). In general characters in utf-8 encoding can be from one to four bytes in length. To get char's length in bytes you can use method len_utf8.

  2. You can get an iterator of characters in a string slice using method chars. Then getting n-th character is as easy as using iterators method nth So the following is true:

assert_eq!(" ░▒▓█".chars().nth(3).unwrap(), '▒');

If you want to have also indices of this chars you can use method char_indices.

  1. Using f64 values to represent nth character is odd and I would encourage you rethink if you really want to do this. But if you do you have two options. You must remember that since characters have a variable length, string's slice method len doesn't return number of characters, but slice's length in bytes. To know how many characters are in the string you have no other option than iterating over it. So if you for example want to have a middle character you must first know how many there are. I can think of two ways you can do this.

    • You can either collect characters for Vec<char> (or something similar). Then you will know how many characters are there and can in O(1) index nth one. However this will result in additional memory allocation.

    • You can fist count how many characters there are with slice.chars().len(). Then calculate position of the nth one and get it by again iterating over chars and getting the nth one (as I showed above). This won't result in any additional memory allocation, but it will have complexity of O(2n), since you will have to iterate over whole string twice.

Which one you pick is up to you. You will have to make a compromise.

  1. This isn't really a correctness problem, but prefer using &str over &String in the arguments of functions (as it will provide more flexibility to your callers). And you don't have to specify lifetime if you have only one reference in the arguments and the other one is in the returned type. Rust will infer that they have to have the same lifetime.
  • Related