Home > Blockchain >  Grouping and counting mediana in pandas dataframe
Grouping and counting mediana in pandas dataframe

Time:04-14

This is my task: Write a function that accepts a dataframe as input, the name of the column with missing values ​​, and a list of grouping columns and returns the dataframe by filling in missing values with the median value

Here is that I tried to do:

def fillnull(set,col):
   val = {col:set[col].sum()/set[col].count()}
   set.fillna(val)
   return set

fillnull(titset,'Age')

My problem is that my function doesn't work, also I don't know how to count median and how to group through this function Here are photos of my dataframe and missing values of my dataset

DATAFRAME

NaN Values

CodePudding user response:

def fillnull(set,col):
set[col] = set[col].fillna(set[col].median())
return set

fillnull(titset,'Age')

CodePudding user response:

Check does this code works for you

import pandas as pd

df = pd.DataFrame({
    'processId': range(100, 900, 100),
    'groupId': [1, 1, 2, 2, 3, 3, 4, 4],
    'other': [1, 2, 3, None, 3, 4, None, 9]
})

print(df)

def fill_na(df, missing_value_col, grouping_col):
    values = df.groupby(grouping_col)[missing_value_col].median()
    df.set_index(grouping_col, inplace=True)
    df.other.fillna(values, inplace=True)
    df.reset_index(grouping_col, inplace=True)
    
    return df

fill_na(df, 'other', 'groupId')
  • Related