I would like to sum the energy values for US, China and Japan and label this 'group1' Then groupby date, country, type and sum the energy values.
Data
We are grouping by date, and type and taking the sum of these specific countries: US, China and Japan - renaming this combination as group1
date country type energy
8/1/2022 US aa 10
8/1/2022 US aa 11
8/1/2022 China bb 50
8/1/2022 Japan bb 20
10/1/2022 Australia bb 5
Desired
date country type energy
8/1/2022 group1 aa 21
8/1/2022 group1 bb 70
10/1/2022 Australia bb 5
Doing
df.groupby(['country','date', 'type'], as_index=False).agg({'energy': sum})
The above script performs the groupby and sum perfectly, but unsure of how to condense certain categories into a group before doing this step.
Any suggestion is appreciated
CodePudding user response:
If you want to combine them, how about just filter first and change the value to group1
and then do the groupby?
df.loc[df['country'].isin(['US', 'China', 'Japan']), 'country'] = 'group1'
df.groupby(['date', 'type', 'country'], as_index=False, sort=False).agg({'energy': sum})
date type country energy
0 8/1/2022 aa group1 21
1 8/1/2022 bb group1 70
2 10/1/2022 bb Australia 5
CodePudding user response:
# define a dictionary to group the countries
d={'US': 'Group-1',
'China':'Group-1',
'Japan' :'Group-1'}
# create a group column, based on mapping
# keeping a separate column, to avoid losing original values
# it can very well be a country
df['group']=df['country'].map(d).fillna(df['country'])
# do a groupby
out= df.groupby(['group','date', 'type'], as_index=False).agg({'energy': sum})
group date type energy
0 Australia 10/1/2022 bb 5
1 Group-1 8/1/2022 aa 21
2 Group-1 8/1/2022 bb 70