Home > Back-end >  Regex to find numbers that are not in a phrase
Regex to find numbers that are not in a phrase

Time:04-22

If I have e.g. this string:

Beschreibung Menge VK-Preis MwSt% Betrag
Schadenbewertunginkl.Restwertermittlung 1 25,00€ 19 25,00€
Rechnungsbetragexcl.MwSt.: 25,00€
MwSt.(19%): 4,75€
Rechnungsbetragincl.MwSt.: 123.029,75€

I want to extract all the numbers. My regexes are:

regex_up_to_thousand = r'\b(?:\d{1,3}){1}(?:,{1}\d{2})\b'

and

regex_every_price = r'\b(?:\d{1,3}(\.|,)) (:?\d{3}(\.|,))(?:\d{2})\b'

My idea was to first get the "big" prices, remove them from the text and get the other numbers. Wich works in most cases, until I have a date that looks like this maybe

Gutachtennummer: 1009126 Leistungsdatum: 11.10.2021

I would get the 11.10 with my second regex, and I don't know how to prevent this. I thought the \b would help, but sadly not.

Any ideas? It's not the end of the world, since I do a lot of math in the background, but it's a possibility that a date would fit some values and I calculate something wrong in the end.

CodePudding user response:

You could try the following pattern.

\b\d (?:(?:\.|,)\d{3})*(?:(?:\.|,)\d{2})\b(?!\W\d)

The main thing is (?!\W\d) at the end which ensures that after your amount you will not have a construct of 1 non-word character followed by 1 digit.

Example: https://regex101.com/r/q1ic9S/1

  • Related