I'm trying to have a regex describe a single-quote-delimited string. Inside the string, I can have either any printable (or whitespace) character (which is NOT a single quote), OR a series of TWO single quotes, which would be an "escaped" single quote.
The [[:print:]] character class (also written as \p{XPosixPrint}) fits the bill for the characters I want to allow... except that it would ALSO allow a single "single quote" ('). Which I don't want to happen.
So, is there a simple way to do that, like, describing a character to match two expressions at the same time (like [[:print:]] and [^'] ), or do I have to create a custom character class enumerating everything I'm allowing (or forbidding) ?
CodePudding user response:
/(?!')\p{Print}/ # Worst performance and kinda yuck?
/\p{Print}(?<!')/ # Better performance but yuckier?
/[^\P{Print}']/ # Best performance, but hard to parse.[1]
use experimental qw( regex_sets ); # No idea why still experimental.
/(?[ \p{Print} - ['] ])/ # Best performance and clearest.
/[^\p{Cn}\p{Co}\p{Cs}\p{Cc}']/ # Non-general solution.
# Best performance but fragile.[2]
\p{Print}
is an alias of \p{XPosixPrint}
.
-
char that is (printable and not(')) = char that is (not(not(printable and not(')))) = char that is (not(not(printable) or not(not(')))) = char that is (not(not(printable) or ')) = [^\P{Print}']
\p{Print}
includes all the characters except unassigned, private use, surrogates and control characters./[^\p{Cn}\p{Co}\p{Cs}\p{Cc}']/
is short for
/[^\p{General_Category=Unassigned}\p{General_Category=Private_Use}\p{General_Category=Surrogates}\p{General_Category=Control}']/
or
use experimental qw( regex_sets ); # No idea why still experimental. /(?[ !( \p{General_Category=Unassigned} \p{General_Category=Private_Use} \p{General_Category=Surrogates} \p{General_Category=Control} ['] ) ])/