I know that many questions have been asked about regex, but I can't see from them where is my mistake.
I want to find a specific word, preceded and followed with any number of whitespaces, and after this word I want to find an exact number of numbers in a specified range, with any number of preceding and following whitespace characters.
regex word_to_match("^\\s*WORD\\s*([1-3]{1,5}\\s*){2}$");
This group:
([1-3]{1,5}\\s*)
(i.e. \s1\s
, \s\s12345\t
) has to be repeated exactly two times. However, when I run my code, the regex correctly doesn't recognize the threefold repetitions, but recognizes single occurrences of this pattern, although it is supposed to find it exactly two times. Do you have any suggestions on this problem? Is it a matter of grouping? How can I force regex to find this pattern exactly two times?
Grouping (additional brackets):
regex gate_not("^\\s*NOT\\s*(([1-3]{1,5}\\s*){2})$");
also doesn't work.
For example:
WORD 1 //correctly unmatched
WORD 123 //INCORRECTLY matched (number of groups is wrong)
EDIT: It looks as if the maximum number of digits satisfied the regex, but it gives incorrect results. Why?
CodePudding user response:
c regexes have some behavior of their own, they don't do repeats automatically.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// split your regex in fixed and repeated part.
// C doesn't do repeating of groups you have to loop for that explicitly
std::regex prefix{ "^\\s*NOT\\s*" };
std::regex repeated_part{ "([1-9]{1,9})(\\s*)" };
std::smatch match;
std::vector<int> numbers;
std::string input{ " NOT 12345 123456 " };
// find prefix part first
if (std::regex_search(input, match, prefix))
{
// set a regex iterator to just after the match of the prefix
auto begin = match.suffix().first;
// then do the group looping yourself
while (std::regex_search(begin, input.cend(), match, repeated_part))
{
// match first group (match[0] is the whole regex match including the whitespaces)
auto number = match[1];
// convert match to an int
numbers.push_back(std::stoi(number));
// and skip to the next part of the string
begin = match.suffix().first;
}
}
// repeat count check
if (numbers.size() == 2)
{
std::cout << "number 1 = " << numbers[0] << "\n";
std::cout << "number 2 = " << numbers[1] << "\n";
}
return 0;
}
CodePudding user response:
Now I've realized, that *
matches ANY number of characters, including zero. I should have specified \\s
to match at least one whitespace character between numbers:
regex gate_not("^\\s*NOT(\\s [1-3]{1,5}){2}\\s*$");