I'm trying to do lookaheads in a conditional statement. Explanation by words:
(specified string that has to be a number (decimal or not) or a word character, a named capturing group is created) (if the named capturing group is a word character then check if the next string is a number (decimal or not) with a lookahead else check if the next string is a word character with a lookahead)
To understand, here some examples that are matched or not:
a 6
or 6.4 b
-> matched, since the first and the second string haven't the same "type"
ab 7
or 7 rt
-> not matched, need only a single word character
R 7.55t
-> not matched, 7.55t is not a valid number
a r
or 5 6
-> not matched, the first and the second string have the same "type" (number and number, or, word character and word character)
I've already found the answer for the first string: (?P<var>([a-zA-Z]|(-?\d (.\d )?)))
I've found nothing on Internet about lookaheads in a condition statement in Python.
The problem is that Python doesn't support conditional statement like PCRE:
Python supports conditionals using a numbered or named capturing group. Python does not support conditionals using lookaround, even though Python does support lookaround outside conditionals. Instead of a conditional like (?(?=regex)then|else)
, you can alternate two opposite lookarounds: (?=regex)then|(?!regex)else
. (source: https://www.regular-expressions.info/conditional.html)
Maybe there's a better solution that I've planned or maybe it's just impossible to do what I want, I don't know.
What I tried: (?P<var>([a-zA-Z]|(-?\d (.\d )?))) (?(?=[a-zA-Z])(?=(-?\d (.\d )?))|(?=[a-zA-Z]))(?P=var)
but that doesn't work.
CodePudding user response:
The named capture group (P<var>...)
contains the actual text which matched, not the regex itself. There is a way to create a named regex, too; but it's probably not particularly necessary or useful here.
Simply spell out the alternatives:
((?<![a-zA-Z0-9])[a-zA-Z]\s -?\d (.\d )?(?![a-zA-Z.0-9])|(?<![a-zA-Z.0-9])-?\d (.\d )?\s [a-zA-Z](?![a-zA-Z0-9]))
If you genuinely require the second token to remain unmatched, it should be obvious how to change the parts starting at each \s
into a lookahead.