Home > database >  How to Explicitly specify the groups in Pandas groupby
How to Explicitly specify the groups in Pandas groupby

Time:08-31

I have a DataFrame containing customer purchase information. I would like to groupby the date column to determine the total sales each day. My problem is that there are some days with no purchases whatsoever. I would like the groupby object to include those missing days as groups with the total sales equal to 0. Is there a way to pass a list of the values of the date column to the groupby function? MWE below

>>> df = pd.DataFrame({
...            'purchase_id': [1, 2, 3, 4],
...            'date' : ['1900-01-01', '1900-01-01', '1900-01-03', '1900-01-04'],
...            'cost' : [1.00, 0.25, 0.50, 0.75]
... })

This group by has the following result.

>>> df.groupby('date').agg({'cost':'sum'})

date           cost
'1900-01-01'   1.25
'1900-01-03'   0.50
'1900-01-04'   0.75

What command can I execute to obtain the following result instead? (Obviously I will have to pass the dates I am expecting to see as an argument, which is fine.)

date           cost
'1900-01-01'   1.25
'1900-01-02'   0.00
'1900-01-03'   0.50
'1900-01-04'   0.75

CodePudding user response:

You can use reindex:

df.groupby('date').agg({'cost':'sum'}).reindex(your_new_date_list)

However, I'd recommend you convert your data to datetime type, then you can use resample:

df['date'] = pd.to_datetime(df['date'])

df.resample('d', on='date')['cost'].sum().reset_index()

Output:

        date  cost
0 1900-01-01  1.25
1 1900-01-02  0.00
2 1900-01-03  0.50
3 1900-01-04  0.75
  • Related