Home > Enterprise >  pandas aggregate column doesnt exist?
pandas aggregate column doesnt exist?

Time:03-26

Currently i have a dataframe that i am preforming a group by on with aggregate functions. these are the functions

            aggregation_functions = {
            '12_months': 'sum',
            '24_months': 'sum',
            '36_months': 'sum',
            'number_36_months': 'sum'
            }

when i do the group by it is dropping an ID column that is classed as a "nuisance"

but when i add the aggregate function for this ID column im getting the error:

[ERROR] 03/25/2022 12:24:44 PM - Column(s) ['id'] do not exist

this is the aggregation im trying to add and this is the group by

'id': 'nunique'
final_df = df.groupby(['buy_country', 'buy_activity', 'vd_country', 'vd_activity'], as_index=False).aggregate(aggregation_functions)

the column does exist in the data frame df

does anyone know why it thinks the column doesnt exist or how to get the aggregate function for this column to work ?

example of the data:

id buy_country buy_activity vd_country vd_activity number_of_buyers 36_months 24_months 12_months number_36_months
000002 GB Not Matched GB Not Matched 1 0 0 0 1
000002 GB Not Matched GB Not Matched 1 0 0 0 4
000002 GB Not Matched GB Not Matched 1 0 0 0 2
000002 GB Not Matched GB Not Matched 1 0 0 0 1

CodePudding user response:

Are you sure that id is a column and not an index?

You could try resetting the index of your DataFrame before you groupby:

df = df.reset_index()
final_df = df.groupby(['buy_country', 'buy_activity', 'vd_country', 'vd_activity'], as_index=False).aggregate(aggregation_functions)
  • Related