I currently have a code working but the only flaw is that I did'nt set the regex python code in the optimal way.
The original text contains an amount in thousands hundred thousands and millions. With no decimal. I mean it has decimal but always ",00".
Example line in Text:
Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid
Right now with the following code is capturing millions fine but less than 100,000 is skipping one digit.
regex = r"(\d ).(\d ).(\d ),(\d )"
match = re.search(regex, line, re.MULTILINE)
print = "$" match.group(1) match.group(2) match.group(3)
It captures like this:
$5860
But target is like this:
$58610
If the amount is in millions it captures fine, I had to do it like that because the currency that I'm working with it has big amounts. So I constantly manage those kind of quantities.
Regards
CodePudding user response:
You can use the following regex to extract your expected matches and remove the thousand separator afterwards:
\$\s?(\d{1,3}(?:\.\d{3}) )(?:,\d )?(?!\d)
You need to get Group 1 value, remove periods from it and reappend $
at the start. See the regex demo. Details:
\$
- a$
char\s?
- an optional whitespace(\d{1,3}(?:\.\d{3}) )
- Group 1: one to three digits, and then one or more occurrences (since you only want to match thousands and more) of.
and three digits(?:,\d )?
- an optional sequence of a comma and one or more digits(?!\d)
- no digit is allowed immediately on the right.
See the Python demo:
import re
text = 'Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid'
match = re.search(r'\$\s?(\d{1,3}(?:\.\d{3}) )(?:,\d )?(?!\d)', text)
if match:
print(f"${match.group(1).replace('.', '')}")
# => $58610