Home > Software design >  best way to split string into Vec<String> at character, when not preceeded by other character
best way to split string into Vec<String> at character, when not preceeded by other character

Time:03-28

Assuming I want to split a String like abc'xyz?'zzz' at every ' into a Vec<String>, but not, if the character is preceeded by a ?. I want to achieve this without Regex lookarounds, since I can't trust the input.

I can assume, that the input is UTF8 compatible.

What would be the fastest (and propably most memory efficient way) to achieve this in Rust?

I thougth about iterating over the String and saving the substring into a var, if next Char is ', but current Char is not ? by Char comparison. I would then push that var's value into a Vec<String> by moving.

Is this a good idea, or are there more efficient (time and memory wise) ways to achieve this?

CodePudding user response:

The most idiomatic way to achieve this would be to make it into an implementation of Iterator, taking &str and producing &str.

Here is an example implementation that assumes that a trailing ' on the input string should not produce an empty element after it, and neither should an empty string produce any element. Note that no copies are made, since we are just dealing with string slices. If you want to produce a Vec<String> then you can do so by mapping the iterator over str::to_owned. (.map(str::to_owned).collect::<Vec<_>>())

use std::str::CharIndices;

// A verbose name for an oddly specific concept.
struct SplitStringAtCharNotFollowingCharIterator<'a> {
    text: &'a str,
    chars: CharIndices<'a>,
}

impl<'a> SplitStringAtCharNotFollowingCharIterator<'a> {
    pub fn new(text: &'a str) -> Self {
        Self { text, chars: text.char_indices() }
    }
}

impl<'a> Iterator for SplitStringAtCharNotFollowingCharIterator<'a> {
    type Item = &'a str;
    
    fn next(&mut self) -> Option<&'a str> {
        let first = self.chars.next();
        
        let (start, mut prior) = match first {
            None => return None,
            Some((_, '\'')) => return Some(""),
            Some(v) => v,
        };
        
        loop {
            let next = self.chars.next();
            
            prior = match (prior, next) {
                (_, None) => return Some(&self.text[start..]),
                
                ('?', Some((_, c))) => c,
                
                (_, Some((end, '\''))) => return Some(&self.text[start..end]),
                
                (_, Some((_, c))) => c,
            }
        }
    }
}

(Playground)

CodePudding user response:

I don't think you need to over-complicate this - a simple for loop will do. This also makes it easy to adjust exactly how you want the splitting to work, e.g. include/exclude the delimiter, what to do with empty matches. Playground

fn split(s: &str) -> Vec<String> {
    let mut chunks = Vec::new();
    let mut cur = String::new();
    let mut last_char = None;
    for c in s.chars() {
        if c == '\'' && last_char != Some('?') {
            chunks.push(std::mem::take(&mut cur));
        } else {
            cur.push(c);
        }
        last_char = Some(c);
    }
    chunks.push(cur);
    chunks
}

If you wanted to produce Vec<&str> you would need to do more work to maintain references into the existing string but since we are returning Vec<String> we can simply copy the characters one-by-one.

  • Related