I have a DataFrame containing customer purchase information. I would like to groupby
the date
column to determine the total sales each day. My problem is that there are some days with no purchases whatsoever. I would like the groupby object to include those missing days as groups with the total sales equal to 0. Is there a way to pass a list of the values of the date
column to the groupby
function? MWE below
>>> df = pd.DataFrame({
... 'purchase_id': [1, 2, 3, 4],
... 'date' : ['1900-01-01', '1900-01-01', '1900-01-03', '1900-01-04'],
... 'cost' : [1.00, 0.25, 0.50, 0.75]
... })
This group by has the following result.
>>> df.groupby('date').agg({'cost':'sum'})
date cost
'1900-01-01' 1.25
'1900-01-03' 0.50
'1900-01-04' 0.75
What command can I execute to obtain the following result instead? (Obviously I will have to pass the dates I am expecting to see as an argument, which is fine.)
date cost
'1900-01-01' 1.25
'1900-01-02' 0.00
'1900-01-03' 0.50
'1900-01-04' 0.75
CodePudding user response:
You can use reindex
:
df.groupby('date').agg({'cost':'sum'}).reindex(your_new_date_list)
However, I'd recommend you convert your data to datetime
type, then you can use resample
:
df['date'] = pd.to_datetime(df['date'])
df.resample('d', on='date')['cost'].sum().reset_index()
Output:
date cost
0 1900-01-01 1.25
1 1900-01-02 0.00
2 1900-01-03 0.50
3 1900-01-04 0.75