My data is like this
group_name group_date value
A 2021-01-01 1
A 2021-01-02 5
A 2021-01-03 3
A 2021-01-04 1.5
A 2021-01-05 3
B 2021-01-01 1
B 2021-01-04 2
B 2021-01-05 9
What I want is to aggregate the value of each group (group_name
group_date
) during the past 2 days (today and yesterday), and show the missing dates as well. The first value of each group can be NaN or itself, it does not really matter. Below is the ideal result:
group_name group_date value
A 2021-01-01 NaN
A 2021-01-02 6
A 2021-01-03 8
A 2021-01-04 4.5
A 2021-01-05 4.5
B 2021-01-01 NaN
B 2021-01-02 1
B 2021-01-03 0
B 2021-01-04 2
B 2021-01-05 11
I wonder whether there is anyway to do this efficiently in Pandas
CodePudding user response:
Try asfreq
and rolling
:
# this assumes that `group_date` is `datetime` type
(df.set_index('group_date')
.groupby('group_name')['value']
.apply(lambda x: x.asfreq('1D').rolling('2D').sum())
.reset_index()
)
Output:
group_name group_date value
0 A 2021-01-01 1.0
1 A 2021-01-02 6.0
2 A 2021-01-03 8.0
3 A 2021-01-04 4.5
4 A 2021-01-05 4.5
5 B 2021-01-01 1.0
6 B 2021-01-02 1.0
7 B 2021-01-03 NaN
8 B 2021-01-04 2.0
9 B 2021-01-05 11.0