Home > Enterprise >  Regex exact repetition not working in c
Regex exact repetition not working in c

Time:10-10

I know that many questions have been asked about regex, but I can't see from them where is my mistake.

I want to find a specific word, preceded and followed with any number of whitespaces, and after this word I want to find an exact number of numbers in a specified range, with any number of preceding and following whitespace characters.

regex word_to_match("^\\s*WORD\\s*([1-3]{1,5}\\s*){2}$");

This group:

([1-3]{1,5}\\s*)

(i.e. \s1\s, \s\s12345\t) has to be repeated exactly two times. However, when I run my code, the regex correctly doesn't recognize the threefold repetitions, but recognizes single occurrences of this pattern, although it is supposed to find it exactly two times. Do you have any suggestions on this problem? Is it a matter of grouping? How can I force regex to find this pattern exactly two times?

Grouping (additional brackets):

regex gate_not("^\\s*NOT\\s*(([1-3]{1,5}\\s*){2})$");

also doesn't work.

For example:

    WORD 1  //correctly unmatched
WORD 123  //INCORRECTLY matched (number of groups is wrong)

EDIT: It looks as if the maximum number of digits satisfied the regex, but it gives incorrect results. Why?

CodePudding user response:

c regexes have some behavior of their own, they don't do repeats automatically.

#include <iostream>
#include <string>
#include <regex>

int main()
{
    // split your regex in fixed and repeated part.
    // C   doesn't do repeating of groups you have to loop for that explicitly
    std::regex prefix{ "^\\s*NOT\\s*" };
    std::regex repeated_part{ "([1-9]{1,9})(\\s*)" };
    std::smatch match;
    std::vector<int> numbers;

    std::string input{ "   NOT   12345 123456  " };

    // find prefix part first
    if (std::regex_search(input, match, prefix))
    {
        // set a regex iterator to just after the match of the prefix
        auto begin = match.suffix().first;

        // then do the group looping yourself
        while (std::regex_search(begin, input.cend(), match, repeated_part))
        {
            // match first group (match[0] is the whole regex match including the whitespaces)
            auto number = match[1];
            // convert match to an int
            numbers.push_back(std::stoi(number));

            // and skip to the next part of the string
            begin = match.suffix().first; 
        }
    }

    // repeat count check
    if (numbers.size() == 2)
    {
        std::cout << "number 1 = " << numbers[0] << "\n";
        std::cout << "number 2 = " << numbers[1] << "\n";
    }

    return 0;
}

CodePudding user response:

Now I've realized, that * matches ANY number of characters, including zero. I should have specified \\s to match at least one whitespace character between numbers:

regex gate_not("^\\s*NOT(\\s [1-3]{1,5}){2}\\s*$");
  • Related