Home > database >  How do I capture a price with thousand and decimal separator with regex in python
How do I capture a price with thousand and decimal separator with regex in python

Time:11-12

I currently have a code working but the only flaw is that I did'nt set the regex python code in the optimal way.

The original text contains an amount in thousands hundred thousands and millions. With no decimal. I mean it has decimal but always ",00".

Example line in Text:

Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid

Right now with the following code is capturing millions fine but less than 100,000 is skipping one digit.

regex = r"(\d ).(\d ).(\d ),(\d )"
            match = re.search(regex, line, re.MULTILINE)
            print = "$" match.group(1) match.group(2) match.group(3)

It captures like this:

$5860

But target is like this:

$58610

If the amount is in millions it captures fine, I had to do it like that because the currency that I'm working with it has big amounts. So I constantly manage those kind of quantities.

Regards

CodePudding user response:

You can use the following regex to extract your expected matches and remove the thousand separator afterwards:

\$\s?(\d{1,3}(?:\.\d{3}) )(?:,\d )?(?!\d)

You need to get Group 1 value, remove periods from it and reappend $ at the start. See the regex demo. Details:

  • \$ - a $ char
  • \s? - an optional whitespace
  • (\d{1,3}(?:\.\d{3}) ) - Group 1: one to three digits, and then one or more occurrences (since you only want to match thousands and more) of . and three digits
  • (?:,\d )? - an optional sequence of a comma and one or more digits
  • (?!\d) - no digit is allowed immediately on the right.

See the Python demo:

import re
text = 'Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid'
match = re.search(r'\$\s?(\d{1,3}(?:\.\d{3}) )(?:,\d )?(?!\d)', text)
if match:
    print(f"${match.group(1).replace('.', '')}")

# => $58610
  • Related