Home > other >  How to extract a specific type of number from a string using regex?
How to extract a specific type of number from a string using regex?

Time:11-09

Consider this string:

text = '''
4 500,5

12%

1,63%

568768,74832 days in between

34 cars in a row'''

As you can see, there are simple numbers, numbers with spaces in between, numbers with comas, and both. Thus, 4 500,5 is considered as a standalone, separate number. Extracting the numbers with comas and spaces is easy and I found the pattern as:

pattern = re.compile(r'(\d  )?\d ,\d ')

However, I am struggling to extract just the simple numbers like 12 and 34. I tried using (?!...) and [^...] but these options do not allow me to exclude the unwanted parts of other numbers.

CodePudding user response:

((?:\d )?\d ,\d )|(\d (?! \d))

I believe this will do what you want (Regexr link: https://regexr.com/695tc)

To capture "simple" numbers, it looks for [one or more digits], which are not followed by [a space and another digit].

I edited so that you can use capture groups appropriately, if desired.

CodePudding user response:

If you only want to match 12 and 34:

(?<!\S)\d \b(?![^\S\n]*[,\d])
  • (?<!\S) Assert a whitespace boundary to the left
  • \d \b Match 1 digits and a word boundary
  • (?! Negative lookahead, assert what is directly to the right is not
    • [^\S\n]*[,\d] Match optional spaces and either , or a digit
  • ) Close lookahead

Regex demo

  • Related