I am trying to remove the non-printable characters using std::regex and [:print:] character class.
The input string could be like this
"\nTesting\t regex and \n\n\t printable characters \a\b set \0\f"
Here \n, \t, \a, \b, \0, \f are non printable characters. I want to remove non-printable except \n and \t.
std::regex nonprintable_regex("(^[[:print:]] )");
std::smatch sm;
if (std::regex_search(str, sm, nonprintable_regex)) {
str = std::regex_replace(str, nonprintable_regex, "");
}
But I am not getting the expected result.
"\nTesting\t regex and \n\n\t printable characters set "
I know I have to add something for \n and \t, but no idea how to add that condition. Any pointers/help, Thanks
CodePudding user response:
You needn't test for a regex first, regex_search
call here is redundant.
The ^
anchor only matches at the start of string, so you are trying to match any one or more printable chars at the start of the string, which is not what you want.
To match any non-printable char you need to use [^[:print:]]
, a negated bracket expression that matches any char but a print
able char.
You can use
std::regex nonprintable_regex("(?![\n\t])[^[:print:]]");
// Or
std::regex nonprintable_regex("[^[:print:]\n\t] ");
See the C demo:
std::string str( "\nTesting\t regex and \n\n\t printable characters \a\b set \0\f" );
std::regex nonprintable_regex("(?![\n\t])[^[:print:]]");
str = std::regex_replace(str, nonprintable_regex, "");
std::cout << str << std::endl;
The (?![\t\n])
negative lookahead restricts what [^[:print:]]
can match, namely, it can no longer match tabs and newlines.
Another way is to include \n
and \t
into the negated bracket expression itself to make it even faster, [^[:print:]\n\t]
.