Currently i have a dataframe that i am preforming a group by on with aggregate functions. these are the functions
aggregation_functions = {
'12_months': 'sum',
'24_months': 'sum',
'36_months': 'sum',
'number_36_months': 'sum'
}
when i do the group by it is dropping an ID column that is classed as a "nuisance"
but when i add the aggregate function for this ID column im getting the error:
[ERROR] 03/25/2022 12:24:44 PM - Column(s) ['id'] do not exist
this is the aggregation im trying to add and this is the group by
'id': 'nunique'
final_df = df.groupby(['buy_country', 'buy_activity', 'vd_country', 'vd_activity'], as_index=False).aggregate(aggregation_functions)
the column does exist in the data frame df
does anyone know why it thinks the column doesnt exist or how to get the aggregate function for this column to work ?
example of the data:
id | buy_country | buy_activity | vd_country | vd_activity | number_of_buyers | 36_months | 24_months | 12_months | number_36_months |
---|---|---|---|---|---|---|---|---|---|
000002 | GB | Not Matched | GB | Not Matched | 1 | 0 | 0 | 0 | 1 |
000002 | GB | Not Matched | GB | Not Matched | 1 | 0 | 0 | 0 | 4 |
000002 | GB | Not Matched | GB | Not Matched | 1 | 0 | 0 | 0 | 2 |
000002 | GB | Not Matched | GB | Not Matched | 1 | 0 | 0 | 0 | 1 |
CodePudding user response:
Are you sure that id is a column and not an index?
You could try resetting the index of your DataFrame before you groupby:
df = df.reset_index()
final_df = df.groupby(['buy_country', 'buy_activity', 'vd_country', 'vd_activity'], as_index=False).aggregate(aggregation_functions)