Home > Net >  How to using regex match all content before these specific symbol in cpp?
How to using regex match all content before these specific symbol in cpp?

Time:10-17

I want to find a front(first) part before some symbols in a string. For example, "ABC, ZXC", "AB.QWE,CV", I want to get the result, "ABC" and "AB". By the way, if there is some chinese character in this sentence, like "1月1日,天气晴" how to get the front part(1月1日)?

It is easier to reach in Python by

import re
front_part = re.findall(r'(.*?)[.,]', content)[0] if re.findall(r'[.,]',sentence) else content #if it didn't find symbols, then return the whole content.  

However, I tried to use following codes in Cpp but it still returns the whole content:

#include<regex>
#include<string>
#include<iostream>
int main()
{
std::string content = "AB.AC";
std::string front_part;
std::smatch frt_pt_sm;
std::regex frt_pt_patt(".*[.,]");
if (std::regex_match(content, frt_pt_sm, frt_pt_patt))
{
for(unsigned i = 0;i < frt_pt_sm.size();   i)
{
std::cout<< frt_pt_sm[i] << std::endl;
} 
front_part = frt_pt_sm[0];
}
return 0;
}

I am a novice in cpp so any suggesion is helpful for me!

CodePudding user response:

C regexes don't have an equivalent to the (.*?) that you're using in Python.

In C you'll want to use something like: [^.,] to match the part up to (but not including) the first . or ,.

On the other hand, given how simple of a pattern you're looking for, you could easily forego using regexes altogether:

std::string input = "AB.QWE,CV";
auto pos = input.find_first_of(".,");
auto front = input.substr(0, pos);

CodePudding user response:

Note that the first element of std::smatch is whole string. If you just want to find the first part of string before . or ,, you can use this.

    std::vector<std::string> contents {"AB.AC", "AB.QWE,CV", "ABC, ZXC"};
    std::string front_part;
    std::smatch frt_pt_sm;
    std::regex frt_pt_patt(R"((\w )(.|,) (\s*\w ))");
    for(auto content: contents) {
        std::cout << "content: " << content << std::endl;

    if (std::regex_match(content, frt_pt_sm, frt_pt_patt))
    {
        for(unsigned i = 0;i < frt_pt_sm.size();   i)
        {
            std::cout<< frt_pt_sm[i] << std::endl;
        } 
        front_part = frt_pt_sm[1];
        std::cout << "front part: " << front_part << std::endl;
    }
}

Result as below.

content: AB.AC
AB.AC
AB
A
C
front part: AB
content: AB.QWE,CV
AB.QWE,CV
AB
C
V
front part: AB
content: ABC, ZXC
ABC, ZXC
ABC
X
C
front part: ABC

More elegant, you can use regex iterator to split string by delimiter . or , like this.

std::vector<std::string> contents {"AB.AC", "AB.QWE,CV", "ABC, ZXC"};

std::string front_part;
std::smatch frt_pt_sm;

std:;regex frt_pt_patt("([.,]|[^.,] )");
for(auto content: contents) {
    std::cout << "content: " << content << std::endl;
    std::regex_iterator<std::string::iterator> rit ( content.begin(), content.end(), frt_pt_patt );
    std::regex_iterator<std::string::iterator> rend;
    while (rit != rend) {
         std::cout << rit->str() << std::endl;
           rit;
    }
}

Result.

content: AB.AC
AB
.
AC
content: AB.QWE,CV
AB
.
QWE
,
CV
content: ABC, ZXC
ABC
,
 ZXC
  • Related