Home > Back-end >  std::regex_search vs std::regex_match
std::regex_search vs std::regex_match

Time:07-22

I have tested (@{[^{}]*})* to match @{whatever} and it is correct (https://regex101.com/). So, in spite of portability nightmare for regular expressions, I finally built the proper std::regex with:

const char *re_str = "@\\{[^\\{\\}]*\\}"; // @{[^{}]*} with curly braces escaped.

Escapes could be simplified using R"()" but that's not the question. As I said, the regex works. Here a simple snippet example which extracts the pattern using regex_search through iteration:

#include <iostream>
#include <string>
#include <regex>

int main () {
  std::string str = "Bye @{foo} ! hi @{bar} !";
  std::smatch matches;
  std::string::const_iterator it( str.cbegin() );

  const char *re_str = "@\\{[^\\{\\}]*\\}"; // @{[^{}]*} with curly braces escaped
  // or: const char *re_str = R"(@\{[^\{\}]*\})";

  try {
    std::regex re(re_str);
    while (std::regex_search(it, str.cend(), matches, re)) {
      std::cout << matches[0] << std::endl;
      it = matches.suffix().first;
    }
  }
  catch (std::exception& e) {
    std::cout << e.what() << std::endl;
    return 1;
  }

  return 0;
}

Output:

g   regex_search.cc && ./a.out
@{foo}
@{bar}

it works.

Well, I'm wondering if there is any better approach (performance pov). So, I tried with std::regex_match instead of iterating on std::regex_search. I used a capture group for that, just enclosing previous regular expression within ()*:

const char *re_str = "(@\\{[^\\{\\}]*\\})*"; // (@{[^{}]*})* with curly braces escaped.

This is the source:

#include <iostream>
#include <string>
#include <regex>

int main () {
  std::string str = "Bye @{foo} ! hi @{bar} !";
  std::smatch matches;
  std::string::const_iterator it( str.cbegin() );

  const char *re_str = "(@\\{[^\\{\\}]*\\})*"; // (@{[^{}]*})* with curly braces escaped.

  try {
    std::regex re(re_str);
    if (std::regex_match(str, matches, re)) {
      for (int k=0; k<matches.size(); k  ) std::cout << "[" << k << "]: " << matches.str(k) << std::endl;
    }
  }
  catch (std::exception& e) {
    std::cout << e.what() << std::endl;
    return 1;
  }

  return 0;
}

Output:

g   regex_match.cc && ./a.out

Its output is empty !!!

I imagine, that's not the way to use std::regex_match although it is supposed to extract matches for captured group. Perhaps the regex this time is invalid (I don't know because, as I said, it is a portability nightmare).

So,

  1. is using regex_search enough and worths the performance concern ?
  2. Is regex_match better algorithm or is it equivalent ?
  3. What's wrong with my source for regex_match ?

BRs, thank you in advance

CodePudding user response:

  • std::regex_search searches for the pattern anywhere in the input string.
  • std::regex_match checks if the pattern matches the entire input string.

Your pattern does not match your entire string, so std::regex_match will not find a match. You would need something like .*?(@{[^{}]*}).*?(@{[^{}]*}).* if you wanted to match the entire string and extract the @{foo} and @{bar} portions.

CodePudding user response:

  1. is using regex_search enough and worths the performance concern ?
  2. Is regex_match better algorithm or is it equivalent ?

I wouldn't say performance is what you should to take into account when comparing these two functions in the first place. Each has more or less specific application area: you use regex_search when you iteratively match specific pattern as a substring of the input and regex_match when you need whole input to match the said pattern.

When the input is of arbitrary length, and unknown at the time of instantiating a std::regex OR if the pattern required to capture all parts of the input is overly complicated, I would go with regex_search, otherwise i would choose regex_match (where required substrings are acquired via capture groups).

  1. What's wrong with my source for regex_match ?

As others already said, it needs to match entire string. The pattern needs to look something like this for your particular string:

std::regex re(R"(.*?(@\{[^{}]*\}).*?(@\{[^{}]*\}).*)");

I assume that it's not as flexible as you expect it to be, so you better opt for the regex_search option you have already implemented.

  • Related