I have:
name nationality income
Joe American 30000
Mira Iraqi NaN
Maria Spanish 87000
I would like to calculate the mean of the income column and replace the missing value NaN with that mean.
When I write:
mean = df["income"].mean()
df["income"].replace(np.nan,mean)
I get:
TypeError: can only concatenate str (not "int") to str
I tried
mean(skipna=True)
to ignore the NaN from the mean calculation, but I get the same result.
CodePudding user response:
use:
df['income'] = df['income'].fillna((df['income'].mean()))
CodePudding user response:
This should be working. When I try it with my dummy dataset:
df = pd.DataFrame({'income': [20, np.nan, 20, 10]})
mean = df.income.mean()
df.income.replace(np.nan, mean)
You can also try the following:
df.loc[df.income.isnull(), 'income'] = df.income.mean()
CodePudding user response:
You can use .fillna()
to replace all nan values like this:
df.income = df.income.fillna((df.income.mean()))