Home > Enterprise >  Fill missing dates in group and convert data to weekly
Fill missing dates in group and convert data to weekly

Time:12-20

I have this data frame with lot of missing dates in between

df = pd.DataFrame({'date':['2021-12-1','2021-12-2','2021-12-21','2021-12-1','2021-12-7','2021-12-1','2021-12-5','2021-12-1','2021-12-5'],
                   'id1':['a1','a1','a1','a1','a1','a2','a2','a2','a2'],
                   'id2':['b1','b1','b1','b2','b2','b3','b3','b4','b4'],
                   'value1':[1,5,7,2,9,3,0,1,7],
                   'value2':[6,2,8,1,9,3,0,2,6]})

Which looks like this:

         date id1 id2  value1  value2
0   2021-12-1  a1  b1       1       6
1   2021-12-2  a1  b1       5       2
2  2021-12-21  a1  b1       7       8
3   2021-12-1  a1  b2       2       1
4   2021-12-7  a1  b2       9       9
5   2021-12-1  a2  b3       3       3
6   2021-12-5  a2  b3       0       0
7   2021-12-1  a2  b4       1       2
8   2021-12-5  a2  b4       7       6

I want my output to look like this where the frequency is changed from daily to weekly and the week starts from Monday.

         date id1 id2  value1  value2
0   2021-12-6  a1  b1       6       8
1  2021-12-13  a1  b1       0       0
2  2021-12-20  a1  b1       0       0
3  2021-12-27  a1  b1       7       8
4   2021-12-6  a1  b2       2       1
5  2021-12-13  a1  b2       9       9
6   2021-12-6  a2  b3       3       3
7   2021-12-6  a2  b4       8       8

Firstly I am filling missing dates with zero values and then in the second step converting daily data to weekly data using resample. Here I am using W-Mon which means I starting my week from Monday.

#Filling missing dates values with zero
df['date'] = pd.to_datetime(df['date'])
df = (df.set_index('date')
      .groupby(['id1','id2'])['value1','value2']
      .apply(lambda x: x.asfreq('d', fill_value=0))
      .reset_index()
      [['date','id1','id2','value1','value2']])
#convert to weekly data and set monday as starting day for each week
df = (df.groupby(['id1','id2'])
       .resample('W-Mon', label='right', closed = 'left', on='date')
       .agg({'value1':'sum',"value2":'sum'} )
       .reset_index())

I am not getting the expected output with the above code.

I am getting output like this

  id1 id2       date  value1  value2
0  a1  b1 2021-12-06       6       8
1  a1  b1 2021-12-13       7       8
2  a1  b2 2021-12-13      11      10
3  a2  b3 2021-12-13       3       3
4  a2  b3 2021-12-20       0       0
5  a2  b4 2021-12-20       1       2
6  a2  b4 2021-12-27       7       6

For groups, a2 and b4 is having dates 2021-12-27 even though in original data we don't have any dates for that week.

CodePudding user response:

The code works using the latest version of pandas.

Update your pandas version.

(It's good code, by the way!)

  • Related