Home > Net >  How to extract fraction numbers with decimals in numerator and/or denominator?
How to extract fraction numbers with decimals in numerator and/or denominator?

Time:10-27

I have the following function that finds the normal, decimal, and fraction numbers (and it keeps the leading zeros, and detects the sign of the numbers as I want to):

def extract_numbers(text):
    numbers = re.findall(r"[- ]?\d*\.\d |[- ]?\d*/\d |[- ]?\d ",  text)
    return numbers

The problem occurs when I test it with a fraction number that has decimals in the numerator or the denominator:

print(extract_numbers('this is difficult to get: -124.01/11.1'))

Output:

['-124.01', '/11', '.1']

When I need it to be like:

['-124.01/11.1']

So how to adjust the regex to extract the numbers with this prioritization: fraction numbers with decimals then fraction numbers then decimal numbers and finally normal numbers

CodePudding user response:

You can use

[- ]?\d*\.?\d (?:/\d*\.?\d )?

See the regex demo. Details:

  • [- ]? - an optional sign
  • \d* - zero or more digits
  • \.? - an optional period
  • \d - one or more digits
  • (?:/\d*\.?\d )? - an optional sequence of
    • / - a / char
    • \d*\.?\d - zero or more digits, an optional period and one or more digits.

CodePudding user response:

You may use the following regular expression to extract the desired strings (when used with Python's PyPi regex package).

(?<!([.\d/]))-?(\d (?:\.\d )?)(?:\/(?2))?(?!(?1))

Regex demo <¯\(ツ)> Python demo

The regex operates as follows.

(?<!          # begin negative lookbehind
  (           # begin capture group 1
    [.\d/]    # match one of the indicated characters
  )           # end capture group 1
)             # end negative lookbehind
-?            # optionally match '-'
(             # begin capture group 2
  \d          # match 1  digits
  (?:\.\d )?  # optionally match '.' followed by 1  digits 
)             # end capture group 2
(?:           # begin non-capture group
  \/          # match '/' 
  (?2)        # recurse subpattern 2
)?            # end non-capture group and make it optional
(?!           # begin negative lookahead
  (?1)        # recurse subpattern 1
)             # end negative lookahead

Notice at the links how the negative lookbehind at the beginning and the negative lookahead at the end avoid inappropriate matches.

I could have used Python's re module, at the expense of a longer regular expression, but chose to use Matthew Barnett's "regex package" to illustrate how subroutines can be used to simplify expressions. Note the "Regex demo" link uses the PCRE engine, which is similar to the PyPi regex engine.

The code that generates capture groups 1 and 2, and the code that later reuses that code is marked below.

(?<!([.\d/]))-?(\d (?:\.\d )?)(?:\/(?2))?(?!(?1))
     111111     2222222222222      yyyy     xxxx

The reference (?2) causes (?2) to be replaced by \d (?:\.\d )? (not by the content of capture group 2). Similarly, (?1) is replaced by [.\d/]. Subroutines are also used for recursion but, as here, can simply be used to avoid repetition, thereby reducing errors and making expressions easier to read.

  • Related