I tried to create a custom user defined function in python for replacing the missing values in a dataset by using Mean value,Median Value and Mode Value. But I am unable to get the required Output.
Condition:
Null values will be replaced by its Mean value when the column present in the dataset is not skewed. Null values will be replaced by its Median value when column present in the dataset is skewed. Null Values will be replaced by its Mode value when the column present in the dataset is a categrorical variable column.
CodePudding user response:
Hope this will help you,
import pandas as pd
def conditional_impute(df,column_name,choice):
try:
if choice == 'mean':
mean_value = df[column_name].mean()
df[column_name].fillna(value=mean_value, inplace=True)
elif choice == 'median':
median_value = df[column_name].median()
df[column_name].fillna(value=median_value, inplace=True)
elif choice == 'mode':
mode_value = df[column_name].mode()[0]
df[column_name].fillna(value=mode_value, inplace=True)
except Exception:
print('Wrong Argument')
return df
df = pd.DataFrame({'a':[1,2,34,4],'b':[None,1,None,4]})
df = conditional_impute(df,'b','mode')
The output of model dataframe is,
a b
0 1 1.0
1 2 1.0
2 34 1.0
3 4 4.0