I have this data frame with lot of missing dates in between
df = pd.DataFrame({'date':['2021-12-1','2021-12-2','2021-12-21','2021-12-1','2021-12-7','2021-12-1','2021-12-5','2021-12-1','2021-12-5'],
'id1':['a1','a1','a1','a1','a1','a2','a2','a2','a2'],
'id2':['b1','b1','b1','b2','b2','b3','b3','b4','b4'],
'value1':[1,5,7,2,9,3,0,1,7],
'value2':[6,2,8,1,9,3,0,2,6]})
Which looks like this:
date id1 id2 value1 value2
0 2021-12-1 a1 b1 1 6
1 2021-12-2 a1 b1 5 2
2 2021-12-21 a1 b1 7 8
3 2021-12-1 a1 b2 2 1
4 2021-12-7 a1 b2 9 9
5 2021-12-1 a2 b3 3 3
6 2021-12-5 a2 b3 0 0
7 2021-12-1 a2 b4 1 2
8 2021-12-5 a2 b4 7 6
I want my output to look like this where the frequency is changed from daily to weekly and the week starts from Monday.
date id1 id2 value1 value2
0 2021-12-6 a1 b1 6 8
1 2021-12-13 a1 b1 0 0
2 2021-12-20 a1 b1 0 0
3 2021-12-27 a1 b1 7 8
4 2021-12-6 a1 b2 2 1
5 2021-12-13 a1 b2 9 9
6 2021-12-6 a2 b3 3 3
7 2021-12-6 a2 b4 8 8
Firstly I am filling missing dates with zero values and then in the second step converting daily data to weekly data using resample
. Here I am using W-Mon
which means I starting my week from Monday.
#Filling missing dates values with zero
df['date'] = pd.to_datetime(df['date'])
df = (df.set_index('date')
.groupby(['id1','id2'])['value1','value2']
.apply(lambda x: x.asfreq('d', fill_value=0))
.reset_index()
[['date','id1','id2','value1','value2']])
#convert to weekly data and set monday as starting day for each week
df = (df.groupby(['id1','id2'])
.resample('W-Mon', label='right', closed = 'left', on='date')
.agg({'value1':'sum',"value2":'sum'} )
.reset_index())
I am not getting the expected output with the above code.
I am getting output like this
id1 id2 date value1 value2
0 a1 b1 2021-12-06 6 8
1 a1 b1 2021-12-13 7 8
2 a1 b2 2021-12-13 11 10
3 a2 b3 2021-12-13 3 3
4 a2 b3 2021-12-20 0 0
5 a2 b4 2021-12-20 1 2
6 a2 b4 2021-12-27 7 6
For groups, a2
and b4
is having dates 2021-12-27
even though in original data we don't have any dates for that week.
CodePudding user response:
The code works using the latest version of pandas.
Update your pandas version.
(It's good code, by the way!)