I have a dataframe
data = {'Age':[18, np.nan, 17, 14, 15, np.nan, 17, 17]}
df = pd.DataFrame(data)
df
I would like to write a solution, which would allow to impute either mean or median, using
df = df.fillna
df = df.fillna(df.median())
Desired output for mean
data = {'Age':[18, 16.3, 17, 14, 15, 16.3, 17, 17]}
df = pd.DataFrame(data)
df
Desired output for median
data = {'Age':[18, 17, 17, 14, 15, 17, 17, 17]}
df = pd.DataFrame(data)
df
CodePudding user response:
Use function:
def f(df, func):
if func in ['mean','median']:
return df.fillna(df.agg(func))
else:
raise Exception("Wrong function, use only 'mean' or 'median'")
If need mean
use:
df = f(df, 'mean')
If need median
use:
df = f(df, 'median')
CodePudding user response:
First parse 'nan'
as a float
:
df = df.astype(float)
df = df.fillna(df.mean())
print (df)
Output:
Age
0 18.000000
1 16.333333
2 17.000000
3 14.000000
4 15.000000
5 16.333333
6 17.000000
7 17.000000
For function:
def f(df, func):
return df.fillna([df.mean(), df.median()][func == 'mean'])