Home > Blockchain >  Re2 unicode character class
Re2 unicode character class

Time:09-28

In the re2 syntax, it says:

\pF Unicode character class F (one-letter name)

enter image description here

Where exactly is that section covered? For example, below on the page there is a section called:

Unicode character class names--general category

But this is one OR two letters long. For example:

enter image description here

Are both allowed, or what's an example of what would and would not be allowed?

https://github.com/google/re2/wiki/Syntax/

CodePudding user response:

As far as I know, it still means what it says. General categories are one or two characters, but only the single character ones can be specified without braces: \pL. If you use braces, you can specify any general category or a script name: \p{L}, \p{Cc}, \p{Greek}.

From the Internationalisation section in Regular expression matching in the wild:

For internationalized character classes, RE2 implements the Unicode 5.2 General Category property (e.g., \pN or \p{Lu}) as well as the Unicode Script property (e.g., \p{Greek}). These should be used whenever matches are not intended to be limited to ASCII characters (e.g., \pN or \p{Nd} instead of [[:digit:]] or \d). RE2 does not implement the other Unicode properties...

Looking at the code, it appears that if you build with ICU support, more properties are supported. But you need braces for property names longer than one character.

  • Related