Home > Mobile >  What is wrong with below C regex code to extract names and value?
What is wrong with below C regex code to extract names and value?

Time:12-21

#include <iostream>
#include <regex>

using namespace std;

int main()
{
    string s = "foo:12,bar:456,b:az:0,";
    regex c("(.*[^,]):([0-9] ),");
    smatch sm;
    if(regex_search(s, sm, c)) {
        cout << "match size:"<<sm.size()<<endl;
        for(int i=0; i < sm.size();i  ){
            cout << "grp1 - " << sm[i] << "\tgrp2 - " << sm[i 1] << endl;
        }
    }
    return 0;
}

I wanted to extract names and corresponding value, I wrote following code but I get following output

match size:3
grp1 - foo:12,bar:456,b:az:0,   grp2 - foo:12,bar:456,b:az
grp1 - foo:12,bar:456,b:az      grp2 - 0
grp1 - 0        grp2 - 

I expected it to be following

match size:3
grp1 - foo,   grp2 - 12
grp1 - bar    grp2 - 456
grp1 - b:az   grp2 - 0

CodePudding user response:

You confuse multiple match extraction with capturing group values retrieval from a single match (yilded by the regex_search function). You need to use a regex iterator to get all matches.

Here, match size:3 means you have the whole match value (Group 0), a capturing group 1 value (Group 1, the (.*[^,]) value) and Group 2 value (captured with ([0-9] )).

Also, the .* at the start of your pattern grabs as many chars other than line break chars as possible, so you actually can't use your pattern for multiple match extraction.

See this C demo:

#include<iostream>
#include<regex>
using namespace std;

int main() {
    string s = "foo:12,bar:456,b:az:0,";
    regex c("([^,] ?):([0-9] )");
    int matches = 0;
    smatch sm;
    sregex_iterator iter(s.begin(), s.end(), c); std::sregex_iterator end;   
    while(iter != end) { 
        sm = *iter;
        cout << "grp1 - " << sm[1].str() << ",   grp2 - " << sm[2].str() << endl; 
        matches  ;
          iter; 
    }
    cout << "Matches: " << matches << endl; 
}

Output:

grp1 - foo,   grp2 - 12
grp1 - bar,   grp2 - 456
grp1 - b:az,   grp2 - 0
Matches: 3

The regex - see its demo - matches

  • ([^,] ?) - Group 1: one or more chars other than a comma, as few as possible
  • : - a colon
  • ([0-9] ) - Group 2: one or more digits.

See the regex demo.

  • Related