Home > front end >  Regex conditionals in Python
Regex conditionals in Python

Time:01-12

I am trying to match a number if certain conditions are met: if word1 is followed by word2 before the number, I'd like to match the first number that comes before word1; if word1 is followed by the number before word2, then I'd like the first occurance of the number that is followed by word1.

I have tried it using two opposite lookarounds: (?=regex)then|(?!regex)else but it did not work. Can someone tell me what I am doing wrong?

(?=(word1[\s\S] ?word2[\s\S] ?[\d\.]{1,},\d{2}\s EUR))((?<=word4)[\s\S] ?([\d\.]{1,},\d{2})\s EUR[\s\S] ?(?=word1))|(?!word1[\s\S] ?[\d\.]{1,},\d{2}\s EUR[\s\S] ?word2)word1[\s\S] ?([\d\.]{1,},\d{2})\s EUR

Here are some concrete examples:

case1

\n28.000,00 EUR\nword3\n308,24 EUR\nword4\nword5\nword1\n2.096,64 EUR\nword2\n308,24 EUR expected match: 2.096,64

case2

\n28.000,00 EUR\nword3\n308,24 EUR\nword4\n2.096,64 EUR\nword5\nword1\nword2\n308,24 EUR expected match: 2.096,64

I have used word4 in my regex as an anchor in order to match the first number before word1. word4 is not a requirement and can be disregarded.

https://regex101.com/r/F8uQZU/2

CodePudding user response:

You might use 2 capture groups to get the values, with an alternation | for both of the cases:

(\d{1,3}(?:\.\d{3})*,\d{2}) EUR(?:(?!\d{1,3}(?:\.\d{3})*,\d{2}).)*word1\\nword2|word1(?:(?!\\nword2).)*?(\d{1,3}(?:\.\d{3})*,\d{2}) EUR

The pattern matches:

  • ( Capture group 1
    • \d{1,3} Match 1-3 digits
    • (?:\.\d{3})* Optionally repeat matching . and 3 digits
    • ,\d{2} Match , and 2 digits
  • ) EUR Close capture group , and match EUR` after it
  • (?: Non capture group
    • (?!\d{1,3}(?:\.\d{3})*,\d{2}). Negative lookahead, assert not the number format to the right, and if so, match any char except a newline using the dot
  • )* Close the non capture group and optionally repeat it to match all characters
  • word1\\nword2 Match word1\nword2
  • | Or
  • word1 Match literally
  • (?:(?!\\nword2).)*? Negative lookahead, assert not \nword2 to the right, and if so, match any char except a newline using the dot. Repeat the outer non capture group again as least as possible, making the quantifier non greedy
  • (\d{1,3}(?:\.\d{3})*,\d{2}) EUR Capture the number format in group 2 and match the following EUR

Regex demo

  •  Tags:  
  • Related