Home > OS >  How do I remove some chars at the end of a string?
How do I remove some chars at the end of a string?

Time:12-11

I need to match a few words in the end of a string, handle it, than removes it. How should I remove few chars or bytes in then end of aString?

I using regex crate to match the string. I can't find a way to remove chars in the end of the String.

Maybe something like this, but have non-ASCII chars:

use lazy_static::lazy_static;
use regex::Regex;
fn func(s: &mut String) {
    lazy_static! {
        static ref RE: Regex = Regex::new(r"123").unwrap();
    }
    let cap = match RE.captures(s.as_str()) {
        Some(v) => v.get(0).unwrap(),
        None => panic!("Error"),
    };
    do_something(cap.as_str());
    s.delete(0, cap.end());
}
fn do_something(s: &str) {
    assert_eq!(s, "123")
}
fn main() {
    let s = String::from("123456");
    func(s);
    assert_eq!(s, "456");
}

I have seen remove method, but it says it's O(n). If it is, I think O(nm) is a little bit too slow for me.

CodePudding user response:

You can use regexes Match::start to get a start of the capture group.

You can then use truncate to get rid of everything after that.

fn main() {
    let mut text: String = "this is a text with some garbage after!abc".into();
    let re = regex::Regex::new("abc$").unwrap();
    let m = re.captures(&text).unwrap();
    let g = m.get(0).unwrap();
    text.truncate(g.start());
    dbg!(text);
}

CodePudding user response:

What you're looking for is truncate - except with non-ascii support.

For ascii only, this works:

let mut s = String::from("123456789");
s.truncate(s.len() - 3);
assert_eq!(s, "123456");

However since String can contain unicode characters which aren't always 1 byte, it doesn't work for non-ascii (panics if the new length does not lie on a char boundary)

If you want non-ascii support, there isn't an O(1) solution according to this answer. That answer does give an implementation using char_indicies(), I think it's the best way unless I'm missing something.

There is also the unicode-truncate crate, which also seems to use char_indicies() - might be worth a look.

  • Related