Home > Mobile >  Moving Average Pandas Across Group
Moving Average Pandas Across Group

Time:02-17

My data has the following structure:

np.random.seed(25)
tdf = pd.DataFrame({'person_id' :[1,1,1,1,
                                  2,2,
                                  3,3,3,3,3,
                                  4,4,4,
                                  5,5,5,5,5,5,5,
                                  6,
                                  7,7,
                                  8,8,8,8,8,8,8,
                                  9,9,
                                  10,10
                                  ],
                    'Date': ['2021-01-02','2021-01-05','2021-01-07','2021-01-09',
                             '2021-01-02','2021-01-05',
                             '2021-01-02','2021-01-05','2021-01-07','2021-01-09','2021-01-11',
                             '2021-01-02','2021-01-05','2021-01-07',
                             '2021-01-02','2021-01-05','2021-01-07','2021-01-09','2021-01-11','2021-01-13','2021-01-15',
                             '2021-01-02',
                              '2021-01-02','2021-01-05',
                              '2021-01-02','2021-01-05','2021-01-07','2021-01-09','2021-01-11','2021-01-13','2021-01-15',
                              '2021-01-02','2021-01-05',
                              '2021-01-02','2021-01-05'
                             ],
                    'Quantity': np.floor(np.random.random(size=35)*100)
                    })

And I want to calculate moving average (2 periods) over Date. So, the final output looks like the following. For first MA, we are taking 2021-01-02 & 2021-01-05 across all observations & calculate the MA (50). Similarly for other dates. The output need not be in the structure I'm showing the report. I just need date & MA column in the final data. enter image description here

Thanks!

CodePudding user response:

IIUC, you can aggregate the similar dates first, getting the sum and count.

Then take the sum per rolling 2 dates (here it doesn't look like you want to take care of a defined period but rather raw successive values, so I am assuming here prior sorting).

Finally, perform the ratio of sum and count to get the mean:

g = tdf.groupby('Date')['Quantity']
out = g.sum().rolling(2).sum()/g.count().rolling(2).sum()

output:

Date
2021-01-02          NaN
2021-01-05    50.210526
2021-01-07    45.071429
2021-01-09    41.000000
2021-01-11    44.571429
2021-01-13    48.800000
2021-01-15    50.500000
Name: Quantity, dtype: float64
joining the original data:
g = tdf.groupby('Date')['Quantity']
s = g.sum().rolling(2).sum()/g.count().rolling(2).sum()
tdf.merge(s.rename('Quantity_MA(2)'), left_on='Date', right_index=True)

output:

    person_id       Date  Quantity  Quantity_MA(2)
0           1 2021-01-02      87.0             NaN
4           2 2021-01-02      41.0             NaN
6           3 2021-01-02      68.0             NaN
11          4 2021-01-02      11.0             NaN
14          5 2021-01-02      16.0             NaN
21          6 2021-01-02      51.0             NaN
22          7 2021-01-02      38.0             NaN
24          8 2021-01-02      51.0             NaN
31          9 2021-01-02      90.0             NaN
33         10 2021-01-02      45.0             NaN
1           1 2021-01-05      58.0       50.210526
5           2 2021-01-05      11.0       50.210526
7           3 2021-01-05      43.0       50.210526
12          4 2021-01-05      44.0       50.210526
15          5 2021-01-05      52.0       50.210526
23          7 2021-01-05      99.0       50.210526
25          8 2021-01-05      55.0       50.210526
32          9 2021-01-05      66.0       50.210526
34         10 2021-01-05      28.0       50.210526
2           1 2021-01-07      27.0       45.071429
8           3 2021-01-07      55.0       45.071429
13          4 2021-01-07      58.0       45.071429
16          5 2021-01-07      32.0       45.071429
26          8 2021-01-07       3.0       45.071429
3           1 2021-01-09      18.0       41.000000
9           3 2021-01-09      36.0       41.000000
17          5 2021-01-09      69.0       41.000000
27          8 2021-01-09      71.0       41.000000
10          3 2021-01-11      40.0       44.571429
18          5 2021-01-11      36.0       44.571429
28          8 2021-01-11      42.0       44.571429
19          5 2021-01-13      83.0       48.800000
29          8 2021-01-13      43.0       48.800000
20          5 2021-01-15      48.0       50.500000
30          8 2021-01-15      28.0       50.500000
  • Related