I have a DataFrame with three columns:
For each row, I want to take the average of the past n (n>= 90) data points with the same column value (i.e., average over all dates less than the current date).
At below, I provided an example. col value is 'A' from row 5 to 55.
dates col value
0 2020-01-01 B 1
1 2020-01-02 B 10
2 2020-01-03 B 1
3 2020-01-04 A 1
4 2020-01-05 A 1
.. ... .. ...
56 2020-02-26 B 1
57 2020-02-27 B 1
58 2020-02-28 B 1
59 2020-02-29 A 1
60 2020-03-01 A 1
Desired output
dates col value
0 2020-01-01 B 1
1 2020-01-02 B 1
2 2020-01-03 B 11/2
3 2020-01-04 A 1
4 2020-01-05 A 1
.. ... .. ...
56 2020-02-26 B 12/3
57 2020-02-27 B 13/4
58 2020-02-28 B 14/5
59 2020-02-29 A 1
60 2020-03-01 A 1
CodePudding user response:
I think you're looking for groupby
expanding().mean()
. Then you can groupby
"col" again to shift
:
g = df.groupby('col')['value']
df['new_val'] = g.expanding().mean().groupby(level=0).shift().droplevel(0).fillna(g.transform('first'))
Output:
dates col value new_val
0 2020-01-01 B 1 1.00
1 2020-01-02 B 10 1.00
2 2020-01-03 B 1 5.50
3 2020-01-04 A 1 1.00
4 2020-01-05 A 1 1.00
56 2020-02-26 B 1 4.00
57 2020-02-27 B 1 3.25
58 2020-02-28 B 1 2.80
59 2020-02-29 A 1 1.00
60 2020-03-01 A 1 1.00