Imagine a dataset like below:
result country start end
5 A 2/14/2022 2/21/2022
10 A 2/21/2022 2/28/2022
30 B 2/28/2022 3/7/2022
50 C 1/3/2022 1/10/2022
60 C 1/10/2022 1/17/2022
70 D 1/17/2022 1/24/2022
40 E 1/24/2022 1/31/2022
20 E 1/31/2022 2/7/2022
30 A 2/7/2022 2/14/2022
20 B 2/14/2022 2/21/2022
Expected output
I need to do groupby (country, start, and end) and the result column should add existing value with the above value and need to populate the average column. For example:
groupby country, start, and end with result and average column is nothing but 5, 5 10/2, 10 30/2, 30 50/2,50 60/2
result average
5 5 eg: (5)
10 7.5 (5 10/2) #resultcol of existingvalue abovevalue divided by 2 = average
30 20 (10 30/2)
50 40 (30 50/2)
60 55 (50 60/2)
70 65 ...
40 55 ...
20 30 ...
30 25 ...
20 25 ...
CodePudding user response:
Try this solution with grouping by country and date, however it may raise error if there is no sufficient data in a subset (i.e. larger than 2):
df_data['average'] = df_data.groupby(['country', 'date'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)
In case you want to group by country only
df_data['average'] = df_data.groupby(['country'])['result'].rolling(2, min_periods=1).mean().reset_index(0, drop=True)
df_data
country date result average
0 A 2/14/2022 5 5.0
1 A 2/21/2022 10 7.5
2 B 2/28/2022 30 30.0
3 C 1/3/2022 50 50.0
4 C 1/10/2022 60 55.0
5 D 1/17/2022 70 70.0
6 E 1/24/2022 40 40.0
7 E 1/31/2022 20 30.0
8 A 2/7/2022 30 20.0
9 B 2/14/2022 20 25.0