I am currently trying to validate a regex pattern for a names list among other things.
It actually works so far except when I try to test the limits. If the name is quite long, a maximum of 128 characters is allowed and then at the end a character which is defined in an inner group, such as:. a separator e.g. Space or a puncture, catastrophic backtracking occurs. Somehow I don't quite understand that because I would assume that group one (?:[\p{L}\p{Nd}\p{Ps}])
1 x must be there, group (?:\p{Zs}\p{P}|\p{P}\p{Zs}|[\p{P}\p{Zs}])?
is optional and if the group has to be valid at the end (?:[\p{L}\p{Nd}\p{Pe}.])
. The rear 2 groups can occur more often.
Full pattern
^(?!.{129})(?!.["])(?:[\p{L}\p{Nd}\p{Ps}]) (?:(?:\p{Zs}\p{P}|\p{P}\p{Zs}|[\p{P}\p{Zs}])?(?:[\p{L}\p{Nd}\p{Pe}.]))*$
Tests & Samples
https://regex101.com/r/6E0Khd/1
CodePudding user response:
You need to re-phrase the pattern in such a way so that the consequent regex parts could not match at the same location inside the string.
You can use
^(?!.{129})(?!.")[\p{L}\p{Nd}\p{Ps}][\p{L}\p{Nd}\p{Pe}.]*(?:(?:\p{Zs}\p{P}?|\p{P}\p{Zs}?)[\p{L}\p{Nd}\p{Pe}.] )*$
See the regex demo.
Your regex was ^<Lookahead_1><Lookahead_2><P_I> (?:<OPT_SEP>?<P_II>)*$
. You need to make sure your string only starts with a char that matches <P_I>
pattern, the rest of the chars can match <P_II>
pattern. So, it should look like ^<Lookahead_1><Lookahead_2><P_I><P_II>*(?:<SEP><P_II> )*$
. Note the P_I
pattern is used to match the first char only, P_II
pattern is added right after P_I
to match zero or more chars matching that pattern, SEP
pattern is now obligatory and P_II
pattern is quantified with
.
I also shrunk the (?:\p{Zs}\p{P}|\p{P}\p{Zs}|[\p{P}\p{Zs}])
pattern into (?:\p{Zs}\p{P}?|\p{P}\p{Zs}?)
(it matches either a horizontal whitespace and an optional punctuation proper symbol, or an optional punctuation proper symbol followed with an optional horizontal whitespace.
Note that \p{Zs}
does not match a TAB char, you may want to use [\p{Zs}\t]
instead.