Home > Software engineering >  Rust - how to efficiently remove characters from beggining of string?
Rust - how to efficiently remove characters from beggining of string?

Time:11-20

I have a long string stored in a variable in Rust. I often remove some characters from its front with a drain method and use the value returned from it:

my_str.drain(0..i).collect::<String>();

The problem is, that draining from this string is done really often in the program and it's slowing it down a lot (it takes ~99.6% of runtime). This is a very expensive operation, since every time, the entire string has to be moved left in the memory.

I do not drain from the end of the string at all (which should be much faster), just from the front.

How can I make this more efficient? Is there some alternative to String, that uses a different memory layout, which would be better for this use case?

CodePudding user response:

As proposed by @SUTerliakov, using VecDeque<char> in this case is much more effective than String either with the pop_front method or the drain method (when draining from the front of course)

CodePudding user response:

As stated by @Jmb, keeping the original string intact and working with slices is certainly a big win.

I don't know, from the question, the context and usage of these strings, but this quick and dirty benchmark shows a substantial difference in performances.

This benchmark is flawed because there is a useless clone() at each repetition, there is no warm-up, there is no black-box for the result, there are no statistics... but it just gives an idea.

use std::time::Instant;

fn with_drain(mut my_str: String) -> usize {
    let mut total = 0;
    'work: loop {
        for &i in [1, 2, 3, 4, 5].iter().cycle() {
            if my_str.len() < i {
                break 'work;
            }
            let s = my_str.drain(0..i).collect::<String>();
            total  = s.len();
        }
    }
    total
}

fn with_slice(my_str: String) -> usize {
    let mut total = 0;
    let mut pos = 0;
    'work: loop {
        for &i in [1, 2, 3, 4, 5].iter().cycle() {
            let next_pos = pos   i;
            if my_str.len() <= next_pos {
                break 'work;
            }
            let s = &my_str[pos..next_pos];
            pos = next_pos;
            total  = s.len();
        }
    }
    total
}

fn main() {
    let my_str="I have a long string stored in a variable in Rust.
I often remove some characters from its front with a drain method and use the value returned from it:
my_str.drain(0..i).collect::<String>();
The problem is, that draining from this string is done really often in the program and it's slowing it down a lot (it takes ~99.6% of runtime). This is a very expensive operation, since every time, the entire string has to be moved left in the memory.
I do not drain from the end of the string at all (which should be much faster), just from the front.
How can I make this more efficient? Is there some alternative to String, that uses a different memory layout, which would be better for this use case?
".to_owned();
    let repeat = 1_000_000;
    let instant = Instant::now();
    for _ in 0..repeat {
        let _ = with_drain(my_str.clone());
    }
    let drain_duration = instant.elapsed();
    let instant = Instant::now();
    for _ in 0..repeat {
        let _ = with_slice(my_str.clone());
    }
    let slice_duration = instant.elapsed();
    println!("{:?} {:?}", drain_duration, slice_duration);
}
/*
$ cargo run --release
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/prog`
5.017018957s 310.466253ms
*/

CodePudding user response:

If you can't use slices because of the lifetimes, you could use a type that provides shared-ownership like SharedString from the shared-string crate or Str from the bytes-utils crate. The former looks more fully-featured but both provide methods that can take the prefix from a string in O(1) because the original data is never moved.

  • Related