I am trying to calculate percentages, make graphs etc. in my dataframe, but a lot of the missing values is not marked as NaN - but instead '0::Unkown', '|Unkown' etc. This - of course - makes everything very messy. I only want to include the "Yes/No" answers, which exists, but is highly outnumbered by the "0::Unkown"-string values.
Is there a way to get rid of them and convert them to NaN?
I have tried using fillna.(), lambda, replace and etc with multiple examples, but nothing seems to help.
Thank you!
The column in my dataframe in question
CodePudding user response:
If gun_stolen
column is supposed to be contain boolean values, the easiest way is to use pd.to_numeric
:
df['gun_stolen'] = pd.to_numeric(df['gun_stolen'], errors='coerce').fillna(0) \
.astype(bool).replace({True: 'Yes', False: 'No'})
CodePudding user response:
Code:
import pandas as pd
import numpy as np
df = pd.DataFrame([np.nan, '0::Unkown', '|Unkown'], columns=['gun_stolen'])
print(df)
Output:
gun_stolen
0 NaN
1 0::Unkown
2 |Unkown
Code:
df.gun_stolen.replace({'0::Unkown' : np.nan, '|Unkown' : np.nan})
Output:
0 Nan
1 NaN
2 NaN
Name: gun_stolen, dtype: object
should do the job as written in the pandas documentation