I have the following DataFrame:
Date | Jockey ID | Position |
---|---|---|
23-12-2018 | 4340 | 1 |
25-11-2018 | 4340 | 5 |
19-12-2018 | 4340 | 10 |
01-01-2019 | 4340 | 3 |
18-10-2017 | 8443 | 1 |
18-02-2018 | 8443 | 6 |
12-05-2018 | 8443 | 7 |
I want to compute the rolling mean final position for each Jockey ID
for the last 1000 days. I am looking for something like this:
Date | Jockey ID | Position | Mean Position |
---|---|---|---|
23-12-2018 | 4340 | 1 | 1 (1/1) |
25-11-2018 | 4340 | 5 | 3 (1 5)/2 |
19-12-2018 | 4340 | 10 | 5.33 (1 5 10)/3 |
01-01-2019 | 4340 | 3 | 4.75 (1 5 10 3)/4 |
18-10-2017 | 8443 | 1 | 1 (1/1) |
18-02-2018 | 8443 | 6 | 3.5 (1 6)/2 |
12-05-2018 | 8443 | 7 | 4.66 (1 6 7)/3 |
Any ideas on how to do it?
CodePudding user response:
Use:
df['Date'] = pd.to_datetime(df['Date'])
#here freq not raise error, but also not working
df['new'] = (df.set_index('Date')
.groupby('Jockey ID', sort=False)['Position']
.expanding(freq='1000D')
.mean()
.to_numpy())
print (df)
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
#for any freq same ouput
df['new'] = (df.set_index('Date')
.groupby('Jockey ID', sort=False)['Position']
.expanding(freq='30D')
.mean()
.to_numpy())
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
#here freq not raise error, but also not working same output like no freq
df['new'] = (df.set_index('Date')
.groupby('Jockey ID', sort=False)['Position']
.expanding()
.mean()
.to_numpy())
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667
Possible solution with Grouper
and GroupBy.transform
:
df['new'] = (df.set_index('Date')
.groupby(['Jockey ID', pd.Grouper(freq='1000D')])['Position']
.transform(lambda x: x.expanding().mean())
.to_numpy())
print (df)
Date Jockey ID Position new
0 2018-12-23 4340 1 1.000000
1 2018-11-25 4340 5 3.000000
2 2018-12-19 4340 10 5.333333
3 2019-01-01 4340 3 4.750000
4 2017-10-18 8443 1 1.000000
5 2018-02-18 8443 6 3.500000
6 2018-12-05 8443 7 4.666667