Home > Mobile >  How to capture multiple instances of the same group in Rust regex?
How to capture multiple instances of the same group in Rust regex?

Time:08-15

Here is my text:

hello: 3 32 2 8

I want to capture it with the following regex:

^([a-z] ):( [0-9] ) $

I'm doing this:

let txt = "hello: 3 32 2 8";
let re = Regex::new("^([a-z] ):( [0-9] ) $")?;
let caps = re.captures(txt);
println!("{caps:?}");

I'm getting only the last number 8 in the second capture group:

Some(Captures({0: Some("hello: 3 32 2 8"), 1: Some("hello"), 2: Some(" 8")}))

I suspect that it is an expected behavior of captures, but what is the workaround?

CodePudding user response:

I would simply capture the whole sequence of integers. Since we know this substring has the expected shape, we can split and parse it with confidence (except if one integer has too many digits).

Note that I added some tolerance around white-spaces.

use regex::Regex;

fn detect(txt: &str) -> Result<(&str, Vec<u32>), Box<dyn std::error::Error>> {
    let re = Regex::new(r"^\s*([a-z] )\s*:((\s*[0-9] ) )\s*$")?;
    let caps = re.captures(txt).ok_or("no match")?;
    // starting from here, we know that all the expected substrings exist
    // thus we can unwrap() the options/errors
    let name = caps.get(1).unwrap().as_str();
    let values = caps
        .get(2)
        .unwrap()
        .as_str()
        .split_ascii_whitespace()
        .filter_map(|s| s.parse().ok()) // FIXME: overflow ignored
        .collect();
    Ok((name, values))
}

fn main() {
    for txt in [
        "hello: 3 32 2 8",
        "hello :\t3 32   2 8",
        "\thello :\t3 32   2 8  ",
        "hello:",
        "hello:9999999999 3",
    ] {
        println!("{:?} ~~> {:?}", txt, detect(txt));
    }
}
/*
"hello: 3 32 2 8" ~~> Ok(("hello", [3, 32, 2, 8]))
"hello :\t3 32   2 8" ~~> Ok(("hello", [3, 32, 2, 8]))
"\thello :\t3 32   2 8  " ~~> Ok(("hello", [3, 32, 2, 8]))
"hello:" ~~> Err("no match")
"hello:9999999999 3" ~~> Ok(("hello", [3]))
*/
  • Related