I'm trying to figure out how to write a regex to match this pattern
测试1003##$%#测试
Chinese Characters non Chinese Characters Chinese Characters, non Chinese Characters can be anything, and Chinese Characters are always the same(测试).
I know we can use ^((?!(\p{Han}).)*$
to match non Chinese Characters.. but not sure how should I make sure the head and tail are always the same Chinese Characters(测试 in this case).
CodePudding user response:
Use
^(\p{Han} )\P{Han}*\g{1}$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\p{Han} Chinese characters
(1 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\P{Han} non-word Chinese characters (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\g{1} matches the same text as most recently matched
by the 1st capturing group
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
If prefix = suffix = 测试
, then use
^测试\P{Han}*测试$
Or, if the suffix and prefix can include more Chinese characters:
^测试\p{Han}*\P{Han}*\p{Han}*测试$
CodePudding user response:
If there should be at least a single character other than \p{Han}
you can match \P{Han}
.
Capture the \p{Han}
chars in capture group 1, and add a backreference at the end to group 1.
^(\p{Han} )\P{Han}.*\1$
^
Start of string(\p{Han} )
Capture group 1, match 1 chars in the han script\P{Han}
Match at least a char other than\p{Han}
.*
Match the rest of the string\1$
Match a backreference to group 1 at the end of the string
To also match only 测试
you can use:
^(\p{Han} )(?:\P{Han}.*\1)?$