Aggregation with sum based on condition-CodePudding

I have a DataFrame like this:

df = pd.DataFrame(data= {'month' : [2,7,4,8], 'sales' : [10,40,70,50]})

I would like to get the sum of sales aggregated by the month. However, I want to have two groups of month combined, the first for months 1-6 (resulting in sales of 80) and the second for the months 7-12 (resulting in 90).

What's the best way to do this?

CodePudding user response：

One way to do this is to create a column that acts as a grouping key. This can be done like so:

import numpy as np
import pandas as pd

df = pd.DataFrame(data= {'month': [2, 7, 4, 8], 'sales' : [10, 40, 70, 50]})
df["foo"] = np.where(df['month'] < 7, 0, 1)
bar = df.groupby(['foo']).sum()

Here, a foo column is being created which will assign a group to each column depending on the condition you defined. i.e df['month'] < 7. Then using this created column you can perform a classic groupby() and obtain the sum.

Note you can also use df.groupby(['foo'])['sales'].agg('sum') if you only want to keep the sales column.

CodePudding user response：

You can use pd.cut to assign labels to the months and use these in a groupby:

>>> df.groupby(pd.cut(df["month"], bins=[0, 6, 12], labels=["1-6", "7-12"]))["sales"].sum()

month
1-6     80
7-12    90
Name: sales, dtype: int64