Home > Software engineering >  Rolling mean with two column identifiers
Rolling mean with two column identifiers

Time:09-08

I have a DataFrame with two column identifier Date and ID and try to calculate the rolling mean of the Score over three days for each ID individually over time.

Date        ID  Score
2022-01-02  1   1
2022-01-03  1   2
2022-01-04  1   1
2022-01-05  1   3
2022-01-02  2   5
2022-01-03  2   6
2022-01-04  2   7
2022-01-05  2   3

So far I only know how to create a rolling mean over one particular column without considering the second identifier ID:

df[RollingMean3]=df[Score].rolling(3).mean()

I try to get

Date        ID  Score  ScoreRollingMean3
2022-01-02  1   1      NaN
2022-01-03  1   2      NaN
2022-01-04  1   1      1.33
2022-01-05  1   3      2
2022-01-02  2   5      NaN
2022-01-03  2   6      NaN
2022-01-04  2   7      6
2022-01-05  2   3      5.33

For reproducability:

df = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
    'ID':[1, 1, 1, 1, 2, 2, 2, 2],
    'Score':[1, 2, 1, 3, 5, 6, 7, 3]})

Thank you very much

CodePudding user response:

Use DataFrame.groupby with Series.droplevel if datetimes are consecutive:

df['RollingMean3']=df.groupby('ID')['Score'].rolling(3).mean().droplevel(0)
print (df)
         Date  ID  Score  RollingMean3
0  2022-01-02   1      1           NaN
1  2022-01-03   1      2           NaN
2  2022-01-04   1      1      1.333333
3  2022-01-05   1      3      2.000000
4  2022-01-02   2      5           NaN
5  2022-01-03   2      6           NaN
6  2022-01-04   2      7      6.000000
7  2022-01-05   2      3      5.333333

General solution with rolling window 3D is possible with DatetimeIndex:

df['Date'] = pd.to_datetime(df['Date'])

df = df.set_index('Date')
df['RollingMean3']=df.groupby('ID')['Score'].rolling('3D').mean().droplevel(0)
print (df)
            ID  Score  RollingMean3
Date                               
2022-01-02   1      1      1.000000
2022-01-03   1      2      1.500000
2022-01-04   1      1      1.333333
2022-01-05   1      3      2.000000
2022-01-02   2      5      5.000000
2022-01-03   2      6      5.500000
2022-01-04   2      7      6.000000
2022-01-05   2      3      5.333333
  • Related