Home > Mobile >  Regex python do lookahead in a conditional statement
Regex python do lookahead in a conditional statement

Time:11-07

I'm trying to do lookaheads in a conditional statement. Explanation by words:

(specified string that has to be a number (decimal or not) or a word character, a named capturing group is created) (if the named capturing group is a word character then check if the next string is a number (decimal or not) with a lookahead else check if the next string is a word character with a lookahead)

To understand, here some examples that are matched or not:

a 6 or 6.4 b-> matched, since the first and the second string haven't the same "type"

ab 7 or 7 rt -> not matched, need only a single word character

R 7.55t -> not matched, 7.55t is not a valid number

a r or 5 6-> not matched, the first and the second string have the same "type" (number and number, or, word character and word character)

I've already found the answer for the first string: (?P<var>([a-zA-Z]|(-?\d (.\d )?)))

I've found nothing on Internet about lookaheads in a condition statement in Python.

The problem is that Python doesn't support conditional statement like PCRE:

Python supports conditionals using a numbered or named capturing group. Python does not support conditionals using lookaround, even though Python does support lookaround outside conditionals. Instead of a conditional like (?(?=regex)then|else), you can alternate two opposite lookarounds: (?=regex)then|(?!regex)else. (source: https://www.regular-expressions.info/conditional.html)

Maybe there's a better solution that I've planned or maybe it's just impossible to do what I want, I don't know.

What I tried: (?P<var>([a-zA-Z]|(-?\d (.\d )?))) (?(?=[a-zA-Z])(?=(-?\d (.\d )?))|(?=[a-zA-Z]))(?P=var) but that doesn't work.

CodePudding user response:

The named capture group (P<var>...) contains the actual text which matched, not the regex itself. There is a way to create a named regex, too; but it's probably not particularly necessary or useful here.

Simply spell out the alternatives:

((?<![a-zA-Z0-9])[a-zA-Z]\s -?\d (.\d )?(?![a-zA-Z.0-9])|(?<![a-zA-Z.0-9])-?\d (.\d )?\s [a-zA-Z](?![a-zA-Z0-9]))

If you genuinely require the second token to remain unmatched, it should be obvious how to change the parts starting at each \s into a lookahead.

Demo: https://ideone.com/nPNAIN

  • Related