I have a DataFrame
with two column identifier Date
and ID
and try to calculate the rolling mean of the Score
over three days for each ID
individually over time.
Date ID Score
2022-01-02 1 1
2022-01-03 1 2
2022-01-04 1 1
2022-01-05 1 3
2022-01-02 2 5
2022-01-03 2 6
2022-01-04 2 7
2022-01-05 2 3
So far I only know how to create a rolling mean over one particular column without considering the second identifier ID
:
df[RollingMean3]=df[Score].rolling(3).mean()
I try to get
Date ID Score ScoreRollingMean3
2022-01-02 1 1 NaN
2022-01-03 1 2 NaN
2022-01-04 1 1 1.33
2022-01-05 1 3 2
2022-01-02 2 5 NaN
2022-01-03 2 6 NaN
2022-01-04 2 7 6
2022-01-05 2 3 5.33
For reproducability:
df = pd.DataFrame({
'Date':['2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
'ID':[1, 1, 1, 1, 2, 2, 2, 2],
'Score':[1, 2, 1, 3, 5, 6, 7, 3]})
Thank you very much
CodePudding user response:
Use DataFrame.groupby
with Series.droplevel
if datetimes are consecutive:
df['RollingMean3']=df.groupby('ID')['Score'].rolling(3).mean().droplevel(0)
print (df)
Date ID Score RollingMean3
0 2022-01-02 1 1 NaN
1 2022-01-03 1 2 NaN
2 2022-01-04 1 1 1.333333
3 2022-01-05 1 3 2.000000
4 2022-01-02 2 5 NaN
5 2022-01-03 2 6 NaN
6 2022-01-04 2 7 6.000000
7 2022-01-05 2 3 5.333333
General solution with rolling window 3D
is possible with DatetimeIndex
:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df['RollingMean3']=df.groupby('ID')['Score'].rolling('3D').mean().droplevel(0)
print (df)
ID Score RollingMean3
Date
2022-01-02 1 1 1.000000
2022-01-03 1 2 1.500000
2022-01-04 1 1 1.333333
2022-01-05 1 3 2.000000
2022-01-02 2 5 5.000000
2022-01-03 2 6 5.500000
2022-01-04 2 7 6.000000
2022-01-05 2 3 5.333333