Home > Software engineering >  pandas fill null values by the mean of that category (use loop?)
pandas fill null values by the mean of that category (use loop?)

Time:12-24

I am trying to fill in the missing data in the data set based on the average of the values observed during that year, and it takes a long time to write one by one. I can't create this structure with a for loop. How should it be coded?

df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))  
df['FEDERAL_REVENUE'] = df.FEDERAL_REVENUE.fillna(df.groupby('YEAR')['FEDERAL_REVENUE'].transform('mean'))  
df['STATE_REVENUE'] = df.STATE_REVENUE.fillna(df.groupby('YEAR')['STATE_REVENUE'].transform('mean'))   
df['TOTAL_EXPENDITURE'] = df.TOTAL_EXPENDITURE.fillna(df.groupby('YEAR')['TOTAL_EXPENDITURE'].transform('mean'))  

I know it's wrong but I wanted to show it as an example .

for column in df.columns:
    df[column] = df.column.fillna(df.groupby('YEAR')[column].transform('mean'))  
    #df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))  

A screenshot as an example

CodePudding user response:

You'd do it like this (use df[column] instead of df.column):

for column in df.columns:
    df[column] = df[column].fillna(df.groupby('YEAR')[column].transform('mean'))
  • Related