Home > Software engineering >  How do I replace missing values with NaN
How do I replace missing values with NaN

Time:05-06

I am using the IMDB dataset for machine learning, and it contains a lot of missing values which are entered as '\N'. Specifically in the StartYear column which contains the movie year release I want to convert the values to integers. Which im not able to do right now, I could drop these values but I wanted to see why they're missing first. I tried several things but no success.

This is my latest attempt:

My attempt

CodePudding user response:

Here is a way to do it without using replace:

import pandas as pd
import numpy as np
df_basics = pd.DataFrame({'startYear':['\\N']*78760 [2017]*18267   [2018]*18263 [2016]*17837 [2019]*17769 ['1996 ','1993 ','2000 ','2019 ','2029 ']})
print(pd.value_counts(df_basics.startYear))
df_basics.loc[df_basics.startYear == '\\N','startYear'] = np.NaN
print(pd.value_counts(df_basics.startYear, dropna=False))

Output:

NaN      78760
2017     18267
2018     18263
2016     17837
2019     17769
1996         1
1993         1
2000         1
2019         1
2029         1
  • Related