I am trying to capture all numbers with a following format:
- Digits with length from 1-5(and not more!) but not starting with 0
- Next goes either
.
or,
- Next goes digits of the length 2-3
- Optionally goes
,
- Optionally goes digits
I have the following regex: (?<!\d)[\d]{1,5}(?!\d)[.,][\d]{2,3}[,]*[\d]*
and it should match:
7,93
8.32
20,43
100.23
2.800
1.597,72
2.026,88
33.000
33.000,43
100.000
150,000
150.000,50
what it should not match:
7.3.2011
07.03.2011
3.2011
I have tested my regex with a following example string:
7.3.2011 zwischen 7,93 und 10,53 EUR Dienstbeginn: 07.03.2011
or in code:
import re
string = '7.3.2011 zwischen 7,93 und 10,53 EUR Dienstbeginn: 07.03.2011'
salary = r"(?<!\d)[\d]{1,5}(?!\d)[.,][\d]{2,3}[,]*[\d]*"
print(re.findall(salary, string))
Unfortunately it matched 3.2011
and 07.03
. I don't understand why did it match 3.2011
? I defined, that after first .
it should match between 2-3 digits, but it matched 4. It shouldn't match 07.03
either, because 07.03.2011
has wrong format(what I don't want to match)
Can you explain me what did I do wrong? Can you please correct my mistake?
CodePudding user response:
You can exclude matching digits and comma's to the left and right and optionally match a comma followed by 1 or more digits.
Note that the [\d]*
by itself does not have to be between square brackets.
(?<![\d.])\d{1,5}[.,]\d{2,3}(?:,\d )?(?![\d.])
Explanation
(?<![\d.])
Assert not either a digit or.
to the left\d{1,5}
Match 1-5 digits[.,]\d{2,3}
Match either.
or,
and 2-3 digits(?:,\d )?
Optionally match,
and 1 digits(?![\d.])
Assert not either a digit or.
to the right
See a regex demo.