Home > Software design >  Python: Can I replace missing values marked as e.g "Unknown" to NaN in a dataframe column?
Python: Can I replace missing values marked as e.g "Unknown" to NaN in a dataframe column?

Time:12-13

I am trying to calculate percentages, make graphs etc. in my dataframe, but a lot of the missing values is not marked as NaN - but instead '0::Unkown', '|Unkown' etc. This - of course - makes everything very messy. I only want to include the "Yes/No" answers, which exists, but is highly outnumbered by the "0::Unkown"-string values.

Is there a way to get rid of them and convert them to NaN?

I have tried using fillna.(), lambda, replace and etc with multiple examples, but nothing seems to help.

Thank you!

The column in my dataframe in question

CodePudding user response:

If gun_stolen column is supposed to be contain boolean values, the easiest way is to use pd.to_numeric:

df['gun_stolen'] = pd.to_numeric(df['gun_stolen'], errors='coerce').fillna(0) \
                     .astype(bool).replace({True: 'Yes', False: 'No'})

CodePudding user response:

Code:

import pandas as pd
import numpy as np
df = pd.DataFrame([np.nan, '0::Unkown', '|Unkown'], columns=['gun_stolen'])

print(df)

Output:

  gun_stolen
0       NaN
1  0::Unkown
2    |Unkown

Code:

df.gun_stolen.replace({'0::Unkown' : np.nan, '|Unkown' : np.nan})

Output:

0     Nan
1     NaN
2     NaN
Name: gun_stolen, dtype: object

should do the job as written in the pandas documentation

  • Related