Home > Software engineering >  Override python built-in float() behavior
Override python built-in float() behavior

Time:12-16

Old title: Convert scientific notation string without 'e' to float in Python

I use a program that has some ...odd formats allowed. One format for real numbers is scientific notation without the letter character 'E'. For example, "-1.67E-6" could be written as "-1.67-6". Obviously, float() doesn't like this. I am writing many classes that would need this same check in multiple fields, so I'm in need of a function to do this cleanly. Is there a way to overwrite the builtin definition of float() so that it could handle this format? I think that would be ideal, but I'm not sure if it's possible.

My current work-around is to use my own string-to-float function that looks something like the following.

str2float(s):
    if re.search(r'([- ][0-9]*.[0-9]*)([- ][0-9]*)', s):
        base, exp = re.findall(r'([- ][0-9]*.[0-9]*)([- ][0-9]*)', X1)[0]
        s = f'{base}E{exp}'
    return float(s)

CodePudding user response:

How about something like this?

def str2float(s):
    sign = ' ' if ' ' in s else '-'
    l = s.split(sign)
    if (len(l) == 3) and (l[0] == ''):
        return float(sign   l[1]   "E"   sign   l[2])
    elif (len(l) == 2) and (l[0] != ''):
        return float(l[0]   "E"   sign   l[1])
    else:
        return float(s)
    
str2float("-1.67 6") # -1670000.0
str2float("-1.67-6") # 1.67e-06
str2float("1.67 6")  # 1670000.0
str2float("1.67")    # 1.67
str2float("-1.67")   # -1.67

CodePudding user response:

You could utilize re.sub in a way that also accounts for current valid scientific notation.

import re

def str_to_float(s: str) -> float:
    s = re.sub(r'(?<=\d)-(?=\d)', 'E-', s)
    s = re.sub(r'(?<=\d)\ (?=\d)', 'E', s)
    return float(s)

print(str_to_float('-1.67E-6')) # -1.67e-06
print(str_to_float('-1.67-6')) # -1.67e-06
print(str_to_float('1.67 6')) # 1670000.0
print(str_to_float('1.67E6')) # 1670000.0
print(str_to_float('-1.67')) # -1.67
print(str_to_float('1.67')) # 1.67

(?<=\d) - positive lookbehind for a single digit (0 through 9).

- and \ - match the characters - and literally.

(?=\d) - positive lookahead for a single digit (0 through 9).

This locates the invalid scientific notation (as float sees it) and replaces them with valid scientific notation.

CodePudding user response:

I think I've found what I was looking for.

class float(float):
    def __new__(cls, s):
        if re.search(r'([- ]?[0-9]*.[0-9]*)([- ][0-9]*)', s):
            base, exp = re.findall(r'([- ]?[0-9]*.[0-9]*)([- ][0-9]*)', s)[0]
            s = f'{base}E{exp}'
        return super(float, cls).__new__(cls, s)

This appears to allow me to over-ride the base behavior of float() in the way that I'm desiring. Is this a good idea? Bad idea? Why?

  • Related