Home > Blockchain >  Why is my string extraction function using back referencing in regex not working as intended?
Why is my string extraction function using back referencing in regex not working as intended?

Time:12-05

Extraction Function

string extractStr(string str, string regExpStr) {
    regex regexp(regExpStr);
    smatch m;
    regex_search(str, m, regexp);
    string result = "";
    for (string x : m)
        result = result   x;
    return result;
}

The Main Code

#include <iostream>
#include <regex>

using namespace std;

string extractStr(string, string);

int main(void) {
    string test = "(1 1)*(n n)";
    cout << extractStr(test, "n\\ n") << endl;
    cout << extractStr(test, "(\\d)\\ \\1") << endl;
    cout << extractStr(test, "([a-zA-Z])[ -/*]\\1") << endl;
    cout << extractStr(test, "([a-zA-Z])[ -/*]([a-zA-Z])") << endl;
    return 0;
}

The Output

String = (1 1)*(n n)
n\ n = n n
(\d)\ \1 = 1 11
([a-zA-Z])[ -/*]\1 = n nn
([a-zA-Z])[ -/*]([a-zA-Z]) = n nnn

If anyone could kindly point the error I've done or point me to a similar question in SO that I've missed while searching, it would be greatly appreciated.

CodePudding user response:

Regexes in C don't work quite like "normal" regexes. Specialy when you are looking for multiple groups later. I also have some C tips in here (constness and references).

#include <cassert>
#include <iostream>
#include <sstream>
#include <regex>
#include <string>


// using namespace std; don't do this!
// https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice

// pass strings by const reference
// 1. const, you promise not to change them in this function
// 2. by reference, you avoid making copies

std::string extractStr(const std::string& str, const std::string& regExpStr)
{
    std::regex regexp(regExpStr);
    std::smatch m;
    std::ostringstream os; // streams are more efficient for building up strings

    auto begin = str.cbegin();
    bool comma = false;

    // C   matches regexes in parts so work you need to loop
    while (std::regex_search(begin, str.end(), m, regexp))
    {
        if (comma) os << ", ";
        os << m[0];
        comma = true;
        begin = m.suffix().first;
    }

    return os.str();
}

// small helper function to produce nicer output for your tests.
void test(const std::string& input, const std::string& regex, const std::string& expected)
{
    
    auto output = extractStr(input, regex);
    if (output == expected)
    {
        std::cout << "test succeeded : output = " << output << "\n";
    }
    else
    {
        std::cout << "test failed : output = " << output << ", expected : " << expected << "\n";
    }
}

int main(void)
{
    std::string input = "(1 1)*(n n)";
    
    test(input, "n\\ n", "n n");
    test(input, "(\\d)\\ \\1", "1 1");
    test(input, "([a-zA-Z])[ -/*]\\1", "n n");
    
    return 0;
}
  • Related