Find which rows are not digits in specific DF column?-CodePudding

I have a dataframe with 40 columns x 2.5 million rows (many financial securities)

df['AMT_ISSUED'] is supposed to be integers but I am getting an error such as

 '>=' not supported between instances of 'str' and 'int'

when I try df2 = df.loc[df['AMT_ISSUED']>=1]

converting to int:df['AMT_ISSUED'] = df['AMT_ISSUED'].astype('int64')

I get: invalid literal for int() with base 10:

Maybe I have a very big number in there?

My question is how can I create a df['IS DIGIT'] (true or false) using isdigit() function to start inspecting the data.

CodePudding user response：

Use

df["IS_DIGIT"] = df["AMT_ISSUED"].apply(str.isnumeric)

CodePudding user response：

use apply with a custom function like this.

Python 3.x

def isdigit(x):
    return isinstance(x, (int, float, complex)) and not isinstance(x, bool)

df['IS DIGIT'] = df['AMT_ISSUED'].apply(isdigit)

Python 2.x

def isdigit(x):
    return isinstance(x, (int, float, complex)) and not isinstance(x, bool)

df['IS DIGIT'] = df['AMT_ISSUED'].apply(isdigit)

You can check how many non-numeric values are there in your dataframe by

df['IS DIGIT'].value_counts()

Also, note that ValueError: invalid literal for int() with base 10 occurs when you convert the string or decimal or characters values not formatted as an integer.