Home > Mobile >  python pandas: Can you perform multiple operations in a groupby?
python pandas: Can you perform multiple operations in a groupby?

Time:03-04

Suppose I have the following DataFrame:

df = pd.DataFrame(
    {
        'year': [2015,2015,2018,2018,2020],
        'total': [100,200,50,150,400],
        'tax': [10,20,5,15,40]
    }
)

I want to sum up the total and tax columns by year and obtain the size at the same time.

The following code gives me the sum of the two columns:

df_total_tax = df.groupby('year', as_index=False)
df_total_tax = df_total_tax[['total','tax']].apply(np.sum)

However, I can't figure out how to also include a column for size at the same time. Must I perform a different groupby, then use .size() and then append that column to df_total_tax? Or is there an easier way?

The end result would look like this:

enter image description here

Thanks

CodePudding user response:

You can specify for each column separately aggregate function in named aggregation:

df = df.groupby('year', as_index=False).agg(total=('total','sum'),
                                            tax=('tax','sum'),
                                            size=('tax', 'size'))
print (df)
   year  total  tax  size
0  2015    300   30     2
1  2018    200   20     2
2  2020    400   40     1
  • Related