If I have e.g. this string:
Beschreibung Menge VK-Preis MwSt% Betrag
Schadenbewertunginkl.Restwertermittlung 1 25,00€ 19 25,00€
Rechnungsbetragexcl.MwSt.: 25,00€
MwSt.(19%): 4,75€
Rechnungsbetragincl.MwSt.: 123.029,75€
I want to extract all the numbers. My regexes are:
regex_up_to_thousand = r'\b(?:\d{1,3}){1}(?:,{1}\d{2})\b'
and
regex_every_price = r'\b(?:\d{1,3}(\.|,)) (:?\d{3}(\.|,))(?:\d{2})\b'
My idea was to first get the "big" prices, remove them from the text and get the other numbers. Wich works in most cases, until I have a date that looks like this maybe
Gutachtennummer: 1009126 Leistungsdatum: 11.10.2021
I would get the 11.10 with my second regex, and I don't know how to prevent this.
I thought the \b
would help, but sadly not.
Any ideas? It's not the end of the world, since I do a lot of math in the background, but it's a possibility that a date would fit some values and I calculate something wrong in the end.
CodePudding user response:
You could try the following pattern.
\b\d (?:(?:\.|,)\d{3})*(?:(?:\.|,)\d{2})\b(?!\W\d)
The main thing is (?!\W\d)
at the end which ensures that after your amount you will not have a construct of 1 non-word character followed by 1 digit.
Example: https://regex101.com/r/q1ic9S/1