Home > Net >  Is there a way to have a character match a conjunction of character classes?
Is there a way to have a character match a conjunction of character classes?

Time:10-20

I'm trying to have a regex describe a single-quote-delimited string. Inside the string, I can have either any printable (or whitespace) character (which is NOT a single quote), OR a series of TWO single quotes, which would be an "escaped" single quote.

The [[:print:]] character class (also written as \p{XPosixPrint}) fits the bill for the characters I want to allow... except that it would ALSO allow a single "single quote" ('). Which I don't want to happen.

So, is there a simple way to do that, like, describing a character to match two expressions at the same time (like [[:print:]] and [^'] ), or do I have to create a custom character class enumerating everything I'm allowing (or forbidding) ?

CodePudding user response:

/(?!')\p{Print}/                     # Worst performance and kinda yuck?
/\p{Print}(?<!')/                    # Better performance but yuckier?
/[^\P{Print}']/                      # Best performance, but hard to parse.[1]
use experimental qw( regex_sets );   # No idea why still experimental.
/(?[ \p{Print} - ['] ])/             # Best performance and clearest.
/[^\p{Cn}\p{Co}\p{Cs}\p{Cc}']/       # Non-general solution.
                                     # Best performance but fragile.[2]

\p{Print} is an alias of \p{XPosixPrint}.


  1.    char that is (printable and not('))
     = char that is (not(not(printable and not('))))
     = char that is (not(not(printable) or not(not('))))
     = char that is (not(not(printable) or '))
     = [^\P{Print}']
    
  2. \p{Print} includes all the characters except unassigned, private use, surrogates and control characters.

    /[^\p{Cn}\p{Co}\p{Cs}\p{Cc}']/
    

    is short for

    /[^\p{General_Category=Unassigned}\p{General_Category=Private_Use}\p{General_Category=Surrogates}\p{General_Category=Control}']/
    

    or

    use experimental qw( regex_sets );   # No idea why still experimental.
    /(?[ !(
         \p{General_Category=Unassigned}
         \p{General_Category=Private_Use}
         \p{General_Category=Surrogates}
         \p{General_Category=Control}
         [']
    ) ])/
    
  • Related