Home > other >  Numpy converting data
Numpy converting data

Time:10-25

I'm using numpy to retrieving data from csv file, it contains 3 columns with data: offer_id, sms_limit, sms_price. I want to add validation:

  • offer_id - only positive integers
  • sms_limit - only positive integers
  • sms_price - positive float number.

I've tried to write my own validator, something like this:

def int_validator(x):
    if str(x).isdigit():
        return x
    raise ValueError('Invalid choice please use positive integer number')


pd.read_csv(
            converters={'offer_id': int_validator, 'sms_limit': int, 'sms_price': int},
            encoding='utf-8',
            engine='python',
        )

but it doesn't work at all :(

It only works if I use int

pd.read_csv(
            converters={'offer_id': int, 'sms_limit': int, 'sms_price': int},
            encoding='utf-8',
            engine='python',
        )

but it's not what I'm looking for. Also, it's only working for column offer_id if I type a string into sms_limit or sms_price there is no validation. Can smb explain how to write my validators and why only the first column accepts int conversion?

CodePudding user response:

Here's a solution that correctly checks if the first two columns contain positive integers and if the last column contains positive floats.

# This uses a try-except block to see if the given value is an integer, 
# and an if-else block to see if the value is >= 0.
# Change the sign to > 0 if you want strictly positive values.
def int_validator(x):
    try:

        # A funny little quirk of python: If you have something like x = "7.0", then int(x) returns an error even though int(float(x)) does not.
        x = int(float(x))
        if x >= 0:
            return x
        else:
            raise ValueError('Invalid choice for {}. Please use positive integer number'.format(x))

    except: 
        raise ValueError('Invalid choice for {}. Please use positive integer number'.format(x))

# This does something similar to the int_validator, but checks if it's a float instead.
def float_validator(x):
    try:
        x = float(x)
        if x >= 0:
            return x
        else:
            raise ValueError('Invalid choice for {}. Please use positive float number'.format(x))

    except:
        raise ValueError('Invalid choice for {}. Please use positive float number'.format(x))

# Now we apply the validators to all the columns.
pd.read_csv("example.csv",
            converters={'offer_id': int_validator, 'sms_limit': int_validator, 'sms_price': float_validator},
            encoding='utf-8',
            engine='python',
        )

Let me know if you have questions!

  • Related