Home > Blockchain >  Regex captures more digits that I defined. Mistake explanation
Regex captures more digits that I defined. Mistake explanation

Time:11-10

I am trying to capture all numbers with a following format:

  1. Digits with length from 1-5(and not more!) but not starting with 0
  2. Next goes either . or ,
  3. Next goes digits of the length 2-3
  4. Optionally goes ,
  5. Optionally goes digits

I have the following regex: (?<!\d)[\d]{1,5}(?!\d)[.,][\d]{2,3}[,]*[\d]*

and it should match:

7,93
8.32
20,43
100.23
2.800
1.597,72
2.026,88
33.000
33.000,43
100.000
150,000
150.000,50

what it should not match:

7.3.2011 
07.03.2011
3.2011

I have tested my regex with a following example string:

7.3.2011  zwischen 7,93 und 10,53 EUR Dienstbeginn: 07.03.2011 

or in code:

import re
string = '7.3.2011  zwischen 7,93 und 10,53 EUR Dienstbeginn: 07.03.2011'
salary = r"(?<!\d)[\d]{1,5}(?!\d)[.,][\d]{2,3}[,]*[\d]*" 
print(re.findall(salary, string))

Unfortunately it matched 3.2011 and 07.03. I don't understand why did it match 3.2011? I defined, that after first . it should match between 2-3 digits, but it matched 4. It shouldn't match 07.03 either, because 07.03.2011 has wrong format(what I don't want to match)

Can you explain me what did I do wrong? Can you please correct my mistake?

CodePudding user response:

You can exclude matching digits and comma's to the left and right and optionally match a comma followed by 1 or more digits.

Note that the [\d]* by itself does not have to be between square brackets.

(?<![\d.])\d{1,5}[.,]\d{2,3}(?:,\d )?(?![\d.])

Explanation

  • (?<![\d.]) Assert not either a digit or . to the left
  • \d{1,5} Match 1-5 digits
  • [.,]\d{2,3} Match either . or , and 2-3 digits
  • (?:,\d )? Optionally match , and 1 digits
  • (?![\d.]) Assert not either a digit or . to the right

See a regex demo.

  • Related