Home > database >  Replacing multiple matches of the same regex in a string
Replacing multiple matches of the same regex in a string

Time:06-27

I'm new to all things std::regex. I attempted to write code which will replace a certain pattern as defined in the DEFAULTS_SS_MARKER_REGEX below.

I can do it once. The problem arises when I try to figure out how to replace multiple instances of that pattern.

the source strings might contain just one instance of the pattern, or multiple instances of the exact same text which matches the pattern or multiple instances of different text that matches the pattern, or combinations of the last two situations.

the place where I get stuck is how to replace, while ignoring the previously matched and substituted text.

I was wondering what the best way would be to achieve what I need to do with multiple replacements of these "markers" in a string?

Attempts to code

using std::regex_match, std::regex_replace, std::regex;

const char * DEFAULTS_SS_MARKER_REGEX = R"(\$\{([\w ._-] ):([\w ._-] ):([\w ._-] )\})";
#pragma region Sample Matches
/*
${Spreadsheet:Sheet:key}
${Hello1:b2:c3}
${Hello1.scss:b2:c3}
${Hello1_-scss:b2:c3}
abc${Hello1:b2:c3}def
${a:b:c}
*/
#pragma endregion Sample Matches


// first attempt, got stuck
std::string r = "${abc:def:ghi} and ${123:456:789} and ${abc:def:ghi}"; // test string to replace
bool checkForMoreMatches = true;
while (checkForMoreMatches) {
    std::cmatch m;
    auto x = regex_search(r.c_str(), m, regex(DEFAULTS_SS_MARKER_REGEX, regex::icase), std::regex_constants::match_default);
    if (m.size() == 4) {
        auto spreadsheet = m[1].str();
        auto sheet       = m[2].str();
        auto key         = m[3].str();
        std::string zz = lookup(spreadsheet, sheet, key);
        r = regex_replace(r, regex(DEFAULTS_SS_MARKER_REGEX, regex::icase), zz);
    } else checkForMoreMatches = m.size();
}

// second attempt at writing it, got stuck

// Default Spreadsheet Marker
std::string r = "${abc:def:ghi} and ${123:456:789} and ${abc:def:ghi}"; // test string to replace
std::smatch m;
while (std::regex_search(r, m, regex(DEFAULTS_SS_MARKER_REGEX, regex::icase))) {
    if (m.size() == 4) {
        auto spreadsheet = m[1].str();
        auto sheet = m[2].str();
        auto key = m[3].str();
        std::string zz = lookup(spreadsheet, sheet, key);
        r = regex_replace(r, regex(DEFAULTS_SS_MARKER_REGEX, regex::icase), zz);
    }
}

// lookup function used by all attempts
std::string lookup(std::string spreadsheet, std::string sheet, std::string key) {
    // some placeholder code, the real code would lookup a value from a spreadsheet 
    return "ss="   spreadsheet   ",sheet="   sheet   ",key="   key; 
}

postscript

@sigma created some very nice short code that unfortunately did not work for me but did work correctly based on the sample lookup function (assuming the comment in it was ignored)

Unfortunately this sample lookup would not really test the new code he replaced his original with (once I informed him that it didn't work for my sitaution)

therefore I'm including an original-sigma-breaker lookup function here in case anyone in the future is interested.

I also include my full test code including sigma's original and new answers.

#include <regex>
#include <iostream>

// This code answered by sigma to my question here:
// [c   - Replacing multiple matches of the same regex in a string - Stack Overflow](https://stackoverflow.com/questions/72763724/replacing-multiple-matches-of-the-same-regex-in-a-string/72764911#72764911)

#pragma region Lookup functions

// lookup function for sigma's original code (see my comment // sigma's original code but not good for our needs (see my comment https://stackoverflow.com/a/72764911/270143))
std::string lookup(std::string spreadsheet, std::string sheet, std::string key) {
    // some placeholder code, the real code would lookup a value from a spreadsheet 
    return "ss="   spreadsheet   ",sheet="   sheet   ",key="   key;
}

std::string upper(const std::string& str)
{
    std::string upper;
    transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
    return upper;
}

// lookup function I devised that will break sigma's original code
std::string lookupUpper(std::string spreadsheet, std::string sheet, std::string key) {
    // some placeholder code, the real code would lookup a value from a spreadsheet 
    return "ss="   upper(spreadsheet)   ",sheet="   upper(sheet)   ",key="   upper(key);
}

#pragma endregion Lookup functions

#pragma region Sigmas original answer

// sigma's original code but not good for our needs (see my comment https://stackoverflow.com/a/72764911/270143)
void testDefaultsSpreadsheetFieldMarkersReplaceSimpleLookup()
{
    std::regex markers{ R"(\$\{([\w ._-] ):([\w ._-] ):([\w ._-] )\})", std::regex::icase };
    const std::string s = "${abc:def:ghi} and ${123:456:789} and ${abc:def:ghi}";
    std::string q = s;
    std::string r = s;

    // The trick here is the special syntax accepted by the "fmt" argument:
    // it will replace $n by the nth capture group
    std::cout << "SIGMA ORIGINAL ANSWER with Original Lookup" << std::endl;
    std::cout << std::regex_replace(q, markers, lookup("$1", "$2", "$3")) << std::endl;
    std::cout << "SIGMA ORIGINAL ANSWER with More Complex Lookup (breaks Sigma code)" << std::endl;
    std::cout << std::regex_replace(r, markers, lookupUpper("$1", "$2", "$3")) << std::endl;
}

#pragma endregion Sigma original answer

#pragma region Sigma new answer

std::string replace(std::string const& s, std::regex const& re, const bool originalLookup)
{
    std::sregex_iterator rbegin{ s.begin(), s.end(), re };
    std::sregex_iterator rend{};

    if (rbegin == rend)
        return s;

    std::string out;
    for (auto i = rbegin; i != rend;   i) {
        auto match = *i;
        out  = match.prefix();
        if (match.size() == 4) {
            auto spreadsheet = match[1].str();
            auto sheet = match[2].str();
            auto key = match[3].str();
            out  = originalLookup ? lookup(spreadsheet, sheet, key) : lookupUpper(spreadsheet, sheet, key);
        }
        if (std::next(i) == rend)
            out  = match.suffix();
    }

    return out;
}

void testDefaultsSpreadsheetFieldMarkersReplaceWhatWeNeed()
{
    std::regex markers{ R"(\$\{([\w ._-] ):([\w ._-] ):([\w ._-] )\})", std::regex::icase };
    const std::string s = "${abc:def:ghi} and ${123:456:789} and ${abc:def:ghi}";
    std::string q = s;
    std::string r = s;

    // The trick here is the special syntax accepted by the "fmt" argument:
    // it will replace $n by the nth capture group
    std::cout << "SIGMA NEW ANSWER with Original Lookup" << std::endl;
    std::cout << replace(q, markers, true) << std::endl;
    std::cout << "SIGMA NEW ANSWER with More Complex Lookup" << std::endl;
    std::cout << replace(r, markers, false) << std::endl;
}

#pragma endregion Sigma new answer

CodePudding user response:

Perhaps not quite as elegant, but this should be compatible with your actual code.

#include <iostream>
#include <regex>
#include <string>

// lookup function used by all attempts
std::string lookup(std::string spreadsheet, std::string sheet, std::string key) {
    // some placeholder code, the real code would lookup a value from a spreadsheet 
    return "ss="   spreadsheet   ",sheet="   sheet   ",key="   key; 
}

std::string replace(std::string const& s, std::regex const& re)
{
    std::sregex_iterator rbegin {s.begin(), s.end(), re};
    std::sregex_iterator rend {};

    if (rbegin == rend)
        return s;

    std::string out;
    for (auto i = rbegin; i != rend;   i) {
        auto match = *i;
        out  = match.prefix();
        if (match.size() == 4) {
            auto spreadsheet = match[1].str();
            auto sheet = match[2].str();
            auto key = match[3].str();
            out  = lookup(spreadsheet, sheet, key);
        }
        if (std::next(i) == rend)
            out  = match.suffix();
    }

    return out;
}

int main()
{
    std::regex markers {R"(\$\{([\w ._-] ):([\w ._-] ):([\w ._-] )\})", std::regex::icase};
    std::string r = "${abc:def:ghi} and ${123:456:789} and ${abc:def:ghi}";
    std::cout << replace(r, markers) << '\n';
}
  • Related