I have tested (@{[^{}]*})*
to match @{whatever}
and it is correct (https://regex101.com/).
So, in spite of portability nightmare for regular expressions, I finally built the proper std::regex with:
const char *re_str = "@\\{[^\\{\\}]*\\}"; // @{[^{}]*} with curly braces escaped.
Escapes could be simplified using R"()"
but that's not the question. As I said, the regex works. Here a simple snippet example which extracts the pattern using regex_search
through iteration:
#include <iostream>
#include <string>
#include <regex>
int main () {
std::string str = "Bye @{foo} ! hi @{bar} !";
std::smatch matches;
std::string::const_iterator it( str.cbegin() );
const char *re_str = "@\\{[^\\{\\}]*\\}"; // @{[^{}]*} with curly braces escaped
// or: const char *re_str = R"(@\{[^\{\}]*\})";
try {
std::regex re(re_str);
while (std::regex_search(it, str.cend(), matches, re)) {
std::cout << matches[0] << std::endl;
it = matches.suffix().first;
}
}
catch (std::exception& e) {
std::cout << e.what() << std::endl;
return 1;
}
return 0;
}
Output:
g regex_search.cc && ./a.out
@{foo}
@{bar}
it works.
Well, I'm wondering if there is any better approach (performance pov).
So, I tried with std::regex_match
instead of iterating on std::regex_search
.
I used a capture group for that, just enclosing previous regular expression within ()*
:
const char *re_str = "(@\\{[^\\{\\}]*\\})*"; // (@{[^{}]*})* with curly braces escaped.
This is the source:
#include <iostream>
#include <string>
#include <regex>
int main () {
std::string str = "Bye @{foo} ! hi @{bar} !";
std::smatch matches;
std::string::const_iterator it( str.cbegin() );
const char *re_str = "(@\\{[^\\{\\}]*\\})*"; // (@{[^{}]*})* with curly braces escaped.
try {
std::regex re(re_str);
if (std::regex_match(str, matches, re)) {
for (int k=0; k<matches.size(); k ) std::cout << "[" << k << "]: " << matches.str(k) << std::endl;
}
}
catch (std::exception& e) {
std::cout << e.what() << std::endl;
return 1;
}
return 0;
}
Output:
g regex_match.cc && ./a.out
Its output is empty !!!
I imagine, that's not the way to use std::regex_match
although it is supposed to extract matches for captured group.
Perhaps the regex this time is invalid (I don't know because, as I said, it is a portability nightmare).
So,
- is using
regex_search
enough and worths the performance concern ? - Is
regex_match
better algorithm or is it equivalent ? - What's wrong with my source for
regex_match
?
BRs, thank you in advance
CodePudding user response:
std::regex_search
searches for the pattern anywhere in the input string.std::regex_match
checks if the pattern matches the entire input string.
Your pattern does not match your entire string, so std::regex_match
will not find a match. You would need something like .*?(@{[^{}]*}).*?(@{[^{}]*}).*
if you wanted to match the entire string and extract the @{foo}
and @{bar}
portions.
CodePudding user response:
- is using
regex_search
enough and worths the performance concern ?- Is
regex_match
better algorithm or is it equivalent ?
I wouldn't say performance is what you should to take into account when comparing these two functions in the first place. Each has more or less specific application area: you use regex_search
when you iteratively match specific pattern as a substring of the input and regex_match
when you need whole input to match the said pattern.
When the input is of arbitrary length, and unknown at the time of instantiating a std::regex
OR if the pattern required to capture all parts of the input is overly complicated, I would go with regex_search
, otherwise i would choose regex_match
(where required substrings are acquired via capture groups).
- What's wrong with my source for
regex_match
?
As others already said, it needs to match entire string. The pattern needs to look something like this for your particular string:
std::regex re(R"(.*?(@\{[^{}]*\}).*?(@\{[^{}]*\}).*)");
I assume that it's not as flexible as you expect it to be, so you better opt for the regex_search
option you have already implemented.