Home > Net >  Calculate the mean of column that contains NaN values
Calculate the mean of column that contains NaN values

Time:10-01

I have:

name   nationality  income 
Joe    American     30000
Mira   Iraqi        NaN
Maria  Spanish      87000

I would like to calculate the mean of the income column and replace the missing value NaN with that mean.

When I write:

mean = df["income"].mean()
df["income"].replace(np.nan,mean)

I get:

TypeError: can only concatenate str (not "int") to str

I tried

mean(skipna=True)

to ignore the NaN from the mean calculation, but I get the same result.

CodePudding user response:

use:

df['income'] = df['income'].fillna((df['income'].mean()))

CodePudding user response:

This should be working. When I try it with my dummy dataset:

df = pd.DataFrame({'income': [20, np.nan, 20, 10]})
mean = df.income.mean()
df.income.replace(np.nan, mean)

You can also try the following:

df.loc[df.income.isnull(), 'income'] = df.income.mean()

CodePudding user response:

You can use .fillna() to replace all nan values like this:

df.income = df.income.fillna((df.income.mean()))
  • Related