I am trying to match a number if certain conditions are met: if word1 is followed by word2 before the number, I'd like to match the first number that comes before word1; if word1 is followed by the number before word2, then I'd like the first occurance of the number that is followed by word1.
I have tried it using two opposite lookarounds: (?=regex)then|(?!regex)else but it did not work. Can someone tell me what I am doing wrong?
(?=(word1[\s\S] ?word2[\s\S] ?[\d\.]{1,},\d{2}\s EUR))((?<=word4)[\s\S] ?([\d\.]{1,},\d{2})\s EUR[\s\S] ?(?=word1))|(?!word1[\s\S] ?[\d\.]{1,},\d{2}\s EUR[\s\S] ?word2)word1[\s\S] ?([\d\.]{1,},\d{2})\s EUR
Here are some concrete examples:
case1
\n28.000,00 EUR\nword3\n308,24 EUR\nword4\nword5\nword1\n2.096,64 EUR\nword2\n308,24 EUR expected match: 2.096,64
case2
\n28.000,00 EUR\nword3\n308,24 EUR\nword4\n2.096,64 EUR\nword5\nword1\nword2\n308,24 EUR expected match: 2.096,64
I have used word4 in my regex as an anchor in order to match the first number before word1. word4 is not a requirement and can be disregarded.
https://regex101.com/r/F8uQZU/2
CodePudding user response:
You might use 2 capture groups to get the values, with an alternation |
for both of the cases:
(\d{1,3}(?:\.\d{3})*,\d{2}) EUR(?:(?!\d{1,3}(?:\.\d{3})*,\d{2}).)*word1\\nword2|word1(?:(?!\\nword2).)*?(\d{1,3}(?:\.\d{3})*,\d{2}) EUR
The pattern matches:
(
Capture group 1\d{1,3}
Match 1-3 digits(?:\.\d{3})*
Optionally repeat matching.
and 3 digits,\d{2}
Match,
and 2 digits
) EUR
Close capture group, and match
EUR` after it(?:
Non capture group(?!\d{1,3}(?:\.\d{3})*,\d{2}).
Negative lookahead, assert not the number format to the right, and if so, match any char except a newline using the dot
)*
Close the non capture group and optionally repeat it to match all charactersword1\\nword2
Matchword1\nword2
|
Orword1
Match literally(?:(?!\\nword2).)*?
Negative lookahead, assert not\nword2
to the right, and if so, match any char except a newline using the dot. Repeat the outer non capture group again as least as possible, making the quantifier non greedy(\d{1,3}(?:\.\d{3})*,\d{2}) EUR
Capture the number format in group 2 and match the followingEUR