groupby, shift, and average a dataframe-CodePudding

I have a DataFrame with three columns:

For each row, I want to take the average of the past n (n>= 90) data points with the same column value (i.e., average over all dates less than the current date).

At below, I provided an example. col value is 'A' from row 5 to 55.

        dates col  value
0  2020-01-01   B      1
1  2020-01-02   B      10
2  2020-01-03   B      1
3  2020-01-04   A      1
4  2020-01-05   A      1
..        ...  ..    ...
56 2020-02-26   B      1
57 2020-02-27   B      1
58 2020-02-28   B      1
59 2020-02-29   A      1
60 2020-03-01   A      1

Desired output

         dates col  value
 0  2020-01-01   B      1
 1  2020-01-02   B      1
 2  2020-01-03   B      11/2
 3  2020-01-04   A      1
 4  2020-01-05   A      1
 ..        ...  ..    ...
 56 2020-02-26   B      12/3
 57 2020-02-27   B      13/4
 58 2020-02-28   B      14/5
 59 2020-02-29   A      1
 60 2020-03-01   A      1

CodePudding user response：

I think you're looking for groupby expanding().mean(). Then you can groupby "col" again to shift:

g = df.groupby('col')['value']
df['new_val'] = g.expanding().mean().groupby(level=0).shift().droplevel(0).fillna(g.transform('first'))

Output:

        dates col  value  new_val
0  2020-01-01   B      1     1.00
1  2020-01-02   B     10     1.00
2  2020-01-03   B      1     5.50
3  2020-01-04   A      1     1.00
4  2020-01-05   A      1     1.00
56 2020-02-26   B      1     4.00
57 2020-02-27   B      1     3.25
58 2020-02-28   B      1     2.80
59 2020-02-29   A      1     1.00
60 2020-03-01   A      1     1.00