Home > Blockchain >  How to efficiently aggregate by segments in Python
How to efficiently aggregate by segments in Python

Time:03-25

I'm looking to find an alternate way to group "total" groupings. Below is over simplified example of what I'm currently doing to create a total grouping. Currently, I'm creating a separate "total" data frame and merging it with the original one. Is there a better way to approach this problem?

import pandas as pd
df = pd.DataFrame({'new_ren':['new', 'ren', 'new', 'new', 'ren'],
                   'dt':['2022-01-01', '2022-01-01', '2022-02-01', '2022-02-01', '2022-03-01'],
                   'ct':[1, 3, 4, 1, 2]})

# create total grouping
df_tot = df.groupby('dt')['ct'].sum().reset_index()
df_tot['new_ren'] = 'total'

# combined both dataframes
pd.concat([df, df_tot])

CodePudding user response:

There's nothing wrong with your approach (I'd say, it's the most intuitive and readable for your current problem) but here's an alternative way using merge to get the same output:

out = df.merge(df.groupby('dt', as_index=False).sum('ct').assign(new_ren='total'), how='outer')

Output:

  new_ren          dt  ct
0     new  2022-01-01   1
1     ren  2022-01-01   3
2     new  2022-02-01   4
3     new  2022-02-01   1
4     ren  2022-03-01   2
5   total  2022-01-01   4
6   total  2022-02-01   5
7   total  2022-03-01   2
  • Related