Grouping and counting mediana in pandas dataframe-CodePudding

This is my task: Write a function that accepts a dataframe as input, the name of the column with missing values , and a list of grouping columns and returns the dataframe by filling in missing values with the median value

Here is that I tried to do:

def fillnull(set,col):
   val = {col:set[col].sum()/set[col].count()}
   set.fillna(val)
   return set

fillnull(titset,'Age')

My problem is that my function doesn't work, also I don't know how to count median and how to group through this function Here are photos of my dataframe and missing values of my dataset

DATAFRAME

NaN Values

CodePudding user response：

def fillnull(set,col):
set[col] = set[col].fillna(set[col].median())
return set

fillnull(titset,'Age')

CodePudding user response：

Check does this code works for you

import pandas as pd

df = pd.DataFrame({
    'processId': range(100, 900, 100),
    'groupId': [1, 1, 2, 2, 3, 3, 4, 4],
    'other': [1, 2, 3, None, 3, 4, None, 9]
})

print(df)

def fill_na(df, missing_value_col, grouping_col):
    values = df.groupby(grouping_col)[missing_value_col].median()
    df.set_index(grouping_col, inplace=True)
    df.other.fillna(values, inplace=True)
    df.reset_index(grouping_col, inplace=True)
    
    return df

fill_na(df, 'other', 'groupId')