Home > Mobile >  Create a function in python, which will impute mean OR median values in the pandas dataframe
Create a function in python, which will impute mean OR median values in the pandas dataframe

Time:09-24

I have a dataframe

data = {'Age':[18, np.nan, 17, 14, 15, np.nan, 17, 17]} 
df = pd.DataFrame(data) 
df

I would like to write a solution, which would allow to impute either mean or median, using

df = df.fillna 
df = df.fillna(df.median())

Desired output for mean

data = {'Age':[18, 16.3, 17, 14, 15, 16.3, 17, 17]} 
df = pd.DataFrame(data) 
df

Desired output for median

data = {'Age':[18, 17, 17, 14, 15, 17, 17, 17]} 
df = pd.DataFrame(data) 
df

CodePudding user response:

Use function:

def f(df, func):
    if func in ['mean','median']:
        return df.fillna(df.agg(func))
    else:
        raise Exception("Wrong function, use only 'mean' or 'median'")
    

If need mean use:

df = f(df, 'mean')

If need median use:

df = f(df, 'median')

CodePudding user response:

First parse 'nan' as a float:

df = df.astype(float)
df = df.fillna(df.mean())
print (df)

Output:

         Age
0  18.000000
1  16.333333
2  17.000000
3  14.000000
4  15.000000
5  16.333333
6  17.000000
7  17.000000

For function:

def f(df, func):
    return df.fillna([df.mean(), df.median()][func == 'mean'])
  • Related