I have a dataframe that is a time series of 7 years of data. I have an index that is a timestamp
and a column (let's call it sales
) for every store
. Each store has its own time series of sales
.
I am trying to resample and sum all data to a monthly visualization like so:
df = df.groupby('store').resample('M').sum()
This indeed groups data by month, but it takes into account the year. I.e., it treats 'December 2010' like a different month from 'December 2011'. I ended up having 7 * 12
rows instead of only 12
rows.
I'd like to sum all months of the 7 years and group them in 12 months of sales.
Minimal reproducible example
index = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730), index=index) #just dummy data
series # would return a index with 730 values (2 years)
series.resample('M').sum() #this returns 24 values, representing each month, which doesn't work for me.
Thanks
CodePudding user response:
You'll probably want to use a df and add a column by applying a function to the date to get just the month. You can probably also do this by apply that function within the groupby, but I'm not sure how that would work and this methodology will get you the results you want
import pandas as pd
dates = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730)) #just dummy data
# make temp df
d = {'date': dates, 'temp': series}
df = pd.DataFrame(d)
# add col just for month
df['month_num'] = df.apply(lambda row: str(row['date']).split('-')[1], axis=1)
print(df)
# get sum for each month
print(df.groupby('month_num')['temp'].sum())
df generated:
date temp month_num
0 2000-01-01 0 01
1 2000-01-02 1 01
2 2000-01-03 2 01
3 2000-01-04 3 01
4 2000-01-05 4 01
.. ... ... ...
725 2001-12-26 725 12
726 2001-12-27 726 12
727 2001-12-28 727 12
728 2001-12-29 728 12
729 2001-12-30 729 12
[730 rows x 3 columns]
output of the groupby month_num sum():
month_num
01 12276
02 12799
03 15965
04 17280
05 19747
06 20940
07 23529
08 25451
09 26460
10 29233
11 30120
12 32285
Name: temp, dtype: int64
CodePudding user response:
Try this, using the month
attribute of pd.DatetimeIndex
:
series.groupby(series.index.month).sum()
Output:
1 12276
2 12799
3 15965
4 17280
5 19747
6 20940
7 23529
8 25451
9 26460
10 29233
11 30120
12 32285
dtype: int64