Home > Software design >  Pandas - rolling sum with missing date
Pandas - rolling sum with missing date

Time:11-03

My data is like this

group_name   group_date    value
A            2021-01-01     1
A            2021-01-02     5
A            2021-01-03     3
A            2021-01-04     1.5
A            2021-01-05     3
B            2021-01-01     1
B            2021-01-04     2
B            2021-01-05     9

What I want is to aggregate the value of each group (group_name group_date) during the past 2 days (today and yesterday), and show the missing dates as well. The first value of each group can be NaN or itself, it does not really matter. Below is the ideal result:

group_name   group_date    value
A            2021-01-01     NaN
A            2021-01-02     6
A            2021-01-03     8
A            2021-01-04     4.5
A            2021-01-05     4.5
B            2021-01-01     NaN
B            2021-01-02     1
B            2021-01-03     0
B            2021-01-04     2
B            2021-01-05     11

I wonder whether there is anyway to do this efficiently in Pandas

CodePudding user response:

Try asfreq and rolling:

# this assumes that `group_date` is `datetime` type
(df.set_index('group_date')
   .groupby('group_name')['value']
   .apply(lambda x: x.asfreq('1D').rolling('2D').sum())
   .reset_index()
)

Output:

  group_name group_date  value
0          A 2021-01-01    1.0
1          A 2021-01-02    6.0
2          A 2021-01-03    8.0
3          A 2021-01-04    4.5
4          A 2021-01-05    4.5
5          B 2021-01-01    1.0
6          B 2021-01-02    1.0
7          B 2021-01-03    NaN
8          B 2021-01-04    2.0
9          B 2021-01-05   11.0
  • Related