Python re
module does not have atomic grouping, it can be however emulated, for example see this
In the plot len(data)
represents an increasing number of sentences (strings formed by 60 words). The code to reproduce it can be found here.
Is my assumption incorrect? On a more general note how can I write a regular expression (in Python) that will only try one of the branches in an alternation regex and none of the others?
CodePudding user response:
Your assumption is not correct. The whole point of atomic patterns is to prevent backtracking into the pattern.
The atomic_group
pattern is of (?=(...))\1
type in your code and the non-atomic one is of (?:...)
type. So, the first one already loses to the second one due to the use of a capturing group, see capturing group VS non-capturing group.
Besides, you need to match the strings twice with the atomic_group
pattern, first, with the lookahead, second, with the backreference.
So, only use this techinque when you need to control backtracking inside a longer pattern.