In the re2
syntax, it says:
\pF
Unicode character class F (one-letter name)
Where exactly is that section covered? For example, below on the page there is a section called:
Unicode character class names--general category
But this is one OR two letters long. For example:
Are both allowed, or what's an example of what would and would not be allowed?
https://github.com/google/re2/wiki/Syntax/
CodePudding user response:
As far as I know, it still means what it says. General categories are one or two characters, but only the single character ones can be specified without braces: \pL
. If you use braces, you can specify any general category or a script name: \p{L}
, \p{Cc}
, \p{Greek}
.
From the Internationalisation section in Regular expression matching in the wild:
For internationalized character classes, RE2 implements the Unicode 5.2 General Category property (e.g.,
\pN
or\p{Lu}
) as well as the Unicode Script property (e.g.,\p{Greek}
). These should be used whenever matches are not intended to be limited to ASCII characters (e.g.,\pN
or\p{Nd}
instead of[[:digit:]]
or\d
). RE2 does not implement the other Unicode properties...
Looking at the code, it appears that if you build with ICU support, more properties are supported. But you need braces for property names longer than one character.