Home > database >  Pandas how to resample monthly using data from more than a year
Pandas how to resample monthly using data from more than a year

Time:07-16

I have a dataframe that is a time series of 7 years of data. I have an index that is a timestamp and a column (let's call it sales) for every store. Each store has its own time series of sales.

I am trying to resample and sum all data to a monthly visualization like so:

df = df.groupby('store').resample('M').sum()

This indeed groups data by month, but it takes into account the year. I.e., it treats 'December 2010' like a different month from 'December 2011'. I ended up having 7 * 12 rows instead of only 12 rows.

I'd like to sum all months of the 7 years and group them in 12 months of sales.

Minimal reproducible example

index = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730), index=index) #just dummy data
series # would return a index with 730 values (2 years)

series.resample('M').sum() #this returns 24 values, representing each month, which doesn't work for me.

Thanks

CodePudding user response:

You'll probably want to use a df and add a column by applying a function to the date to get just the month. You can probably also do this by apply that function within the groupby, but I'm not sure how that would work and this methodology will get you the results you want

import pandas as pd

dates = pd.date_range('1/1/2000', periods=730, freq='D') #2 years of daily data
series = pd.Series(range(730)) #just dummy data

# make temp df
d = {'date': dates, 'temp': series}
df = pd.DataFrame(d)

# add col just for month
df['month_num'] = df.apply(lambda row: str(row['date']).split('-')[1], axis=1)

print(df)

# get sum for each month
print(df.groupby('month_num')['temp'].sum())

df generated:

          date  temp month_num
0   2000-01-01     0        01
1   2000-01-02     1        01
2   2000-01-03     2        01
3   2000-01-04     3        01
4   2000-01-05     4        01
..         ...   ...       ...
725 2001-12-26   725        12
726 2001-12-27   726        12
727 2001-12-28   727        12
728 2001-12-29   728        12
729 2001-12-30   729        12
[730 rows x 3 columns]

output of the groupby month_num sum():

month_num
01    12276
02    12799
03    15965
04    17280
05    19747
06    20940
07    23529
08    25451
09    26460
10    29233
11    30120
12    32285
Name: temp, dtype: int64

CodePudding user response:

Try this, using the month attribute of pd.DatetimeIndex:

series.groupby(series.index.month).sum()

Output:

1     12276
2     12799
3     15965
4     17280
5     19747
6     20940
7     23529
8     25451
9     26460
10    29233
11    30120
12    32285
dtype: int64
  • Related