Home > OS >  Why doesn't (...)? regular expression capture the string?
Why doesn't (...)? regular expression capture the string?

Time:12-25

I have a code where a QString is being modified using a regular expression:

QString str; // str is the string that shall be modified
QString pattern, after; // pattern and after are parameters provided as arguments

str.replace(QRegularExpression(pattern), after);

Whenever I need to append something to the end of the string I use the arguments:

QString pattern("$");
QString after("ending");

Now I have a case where the same pattern is being applied two times, but it shall append the string only once. I expected that this should work (I assume that the initial string doesn't end on "ending"):

QString pattern("(ending)?$");
QString after("ending");

But if applied twice this pattern produces double ending: "<initial string>endingending".

Looks like the ()? expression is lazy, and it captures the expression in parentheses if I force it with a sub-expression before:

QString pattern("string(ending)?$");
QString after("ending");

QString str("Init string");
str.replace(QRegularExpression(pattern), after);
// str == "Init ending"

What's wrong with the "()?" construction (why it is lazy) and how to achieve my goal?

I'm using Qt 5.14.0 (due to some dependencies I cannot use Qt6).

CodePudding user response:

What ? in your regex is doing is that it is telling the regex engine that the string can optionally end with ending. Your question is a bit unclear, but if I understand it correctly, what you need instead is a negative lookbehind. Changing your pattern as follows should do the trick:

QString pattern(".*(?<!ending)$");

This makes sure that it only matches strings that don't originally end with ending. You can play with it here.

CodePudding user response:

Ok I have explanation why it happens (so question in title is answered).

Basically QRegExp::repalce or std::regex_replace finds two matches and performs two replacements. One where capture group matches ending and second times when capture group do not match and only ending is matched.

Here is demo in clean C which illustrates the issue:

int main()
{
    std::string s;
    auto r = std::regex{"(ending)?$"};
    auto after = "ending";
    while(getline(std::cin, s)) {
        std::cout << "s: " << s << '\n';
        std::cout << "replace: " << std::regex_replace(s, r, after) << '\n';
        for (auto i = std::regex_iterator{s.begin(), s.end(), r};
            i != decltype(i){};
              i) {
            std::cout << "found: " << i->str() << " capture: " << i->str(1);
            std::cout << '\n';
        }
        std::cout << "------------\n";
    }

    return 0;
}

https://godbolt.org/z/e85ajb9aP

Now knowing root cause you can try address this issue. For now I do not idea how to do it without hack solution.

  • Related