This is my task: Write a function that accepts a dataframe as input, the name of the column with missing values , and a list of grouping columns and returns the dataframe by filling in missing values with the median value
Here is that I tried to do:
def fillnull(set,col):
val = {col:set[col].sum()/set[col].count()}
set.fillna(val)
return set
fillnull(titset,'Age')
My problem is that my function doesn't work, also I don't know how to count median and how to group through this function Here are photos of my dataframe and missing values of my dataset
CodePudding user response:
def fillnull(set,col):
set[col] = set[col].fillna(set[col].median())
return set
fillnull(titset,'Age')
CodePudding user response:
Check does this code works for you
import pandas as pd
df = pd.DataFrame({
'processId': range(100, 900, 100),
'groupId': [1, 1, 2, 2, 3, 3, 4, 4],
'other': [1, 2, 3, None, 3, 4, None, 9]
})
print(df)
def fill_na(df, missing_value_col, grouping_col):
values = df.groupby(grouping_col)[missing_value_col].median()
df.set_index(grouping_col, inplace=True)
df.other.fillna(values, inplace=True)
df.reset_index(grouping_col, inplace=True)
return df
fill_na(df, 'other', 'groupId')