I am trying to fill in the missing data in the data set based on the average of the values observed during that year, and it takes a long time to write one by one. I can't create this structure with a for loop. How should it be coded?
df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))
df['FEDERAL_REVENUE'] = df.FEDERAL_REVENUE.fillna(df.groupby('YEAR')['FEDERAL_REVENUE'].transform('mean'))
df['STATE_REVENUE'] = df.STATE_REVENUE.fillna(df.groupby('YEAR')['STATE_REVENUE'].transform('mean'))
df['TOTAL_EXPENDITURE'] = df.TOTAL_EXPENDITURE.fillna(df.groupby('YEAR')['TOTAL_EXPENDITURE'].transform('mean'))
I know it's wrong but I wanted to show it as an example .
for column in df.columns:
df[column] = df.column.fillna(df.groupby('YEAR')[column].transform('mean'))
#df['TOTAL_REVENUE'] = df.TOTAL_REVENUE.fillna(df.groupby('YEAR')['TOTAL_REVENUE'].transform('mean'))
CodePudding user response:
You'd do it like this (use df[column]
instead of df.column
):
for column in df.columns:
df[column] = df[column].fillna(df.groupby('YEAR')[column].transform('mean'))