I want to group data based on each week and found the solution in the following post, https://www.statology.org/pandas-group-by-week/
According to the post, following the two operations below allows you to group the data per week.
# Create a dataframe with datetime values
df = pd.DataFrame({'date': pd.date_range(start='1/5/2022', freq='D', periods=15),
'sales': [6, 8, 9, 5, 4, 8, 8, 3, 5, 9, 8, 3, 4, 7, 7]})
#convert date column to datetime and subtract one week
df['date'] = pd.to_datetime(df['date']) - pd.to_timedelta(7, unit='d')
#calculate sum of values, grouped by week
df.groupby([pd.Grouper(key='date', freq='W')])['sales'].sum()
The question I have is whats the point of operation 1, where you subtract 7 days from the original date. How does this help with the grouping as all the dates are translated 7 days back? Couldn't you just use the second operation to group the dates ?
Thanks
CodePudding user response:
This is likely a trick to set the week start as reference, the best would be to use directly the label='left'
parameter of pandas.Grouper
:
df.groupby(pd.Grouper(key='date', freq='W', label='left'))['sales'].sum()
Output:
date
2022-01-02 32
2022-01-09 44
2022-01-16 18
Freq: W-SUN, Name: sales, dtype: int64
alternative
df.groupby(pd.to_datetime(df['date']).dt.to_period('W'))['sales'].sum()
Output:
2022-01-03/2022-01-09 32
2022-01-10/2022-01-16 44
2022-01-17/2022-01-23 18
Freq: W-SUN, Name: sales, dtype: int64