I have a DataFrame like this:
df = pd.DataFrame(data= {'month' : [2,7,4,8], 'sales' : [10,40,70,50]})
I would like to get the sum of sales aggregated by the month. However, I want to have two groups of month combined, the first for months 1-6 (resulting in sales of 80) and the second for the months 7-12 (resulting in 90).
What's the best way to do this?
CodePudding user response:
One way to do this is to create a column that acts as a grouping key. This can be done like so:
import numpy as np
import pandas as pd
df = pd.DataFrame(data= {'month': [2, 7, 4, 8], 'sales' : [10, 40, 70, 50]})
df["foo"] = np.where(df['month'] < 7, 0, 1)
bar = df.groupby(['foo']).sum()
Here, a foo
column is being created which will assign a group to each column depending on the condition you defined. i.e df['month'] < 7
. Then using this created column you can perform a classic groupby()
and obtain the sum.
Note you can also use df.groupby(['foo'])['sales'].agg('sum')
if you only want to keep the sales column.
CodePudding user response:
You can use pd.cut
to assign labels to the months and use these in a groupby
:
>>> df.groupby(pd.cut(df["month"], bins=[0, 6, 12], labels=["1-6", "7-12"]))["sales"].sum()
month
1-6 80
7-12 90
Name: sales, dtype: int64