I have a dataframe with 40 columns x 2.5 million rows (many financial securities)
df['AMT_ISSUED']
is supposed to be integers but I am getting an error such as
'>=' not supported between instances of 'str' and 'int'
when I try df2 = df.loc[df['AMT_ISSUED']>=1]
converting to int:df['AMT_ISSUED'] = df['AMT_ISSUED'].astype('int64')
I get: invalid literal for int() with base 10:
Maybe I have a very big number in there?
My question is how can I create a df['IS DIGIT']
(true or false) using isdigit()
function to start inspecting the data.
CodePudding user response:
Use
df["IS_DIGIT"] = df["AMT_ISSUED"].apply(str.isnumeric)
CodePudding user response:
use apply with a custom function like this.
Python 3.x
def isdigit(x):
return isinstance(x, (int, float, complex)) and not isinstance(x, bool)
df['IS DIGIT'] = df['AMT_ISSUED'].apply(isdigit)
Python 2.x
def isdigit(x):
return isinstance(x, (int, float, complex)) and not isinstance(x, bool)
df['IS DIGIT'] = df['AMT_ISSUED'].apply(isdigit)
You can check how many non-numeric values are there in your dataframe by
df['IS DIGIT'].value_counts()
Also, note that ValueError: invalid literal for int() with base 10 occurs when you convert the string or decimal or characters values not formatted as an integer.