How can I compute the rolling mean of a column for a set period of time, using Pandas and groupby?-CodePudding

I have the following DataFrame:

Date	Jockey ID	Position
23-12-2018	4340	1
25-11-2018	4340	5
19-12-2018	4340	10
01-01-2019	4340	3
18-10-2017	8443	1
18-02-2018	8443	6
12-05-2018	8443	7

I want to compute the rolling mean final position for each Jockey ID for the last 1000 days. I am looking for something like this:

Date	Jockey ID	Position	Mean Position
23-12-2018	4340	1	1 (1/1)
25-11-2018	4340	5	3 (1 5)/2
19-12-2018	4340	10	5.33 (1 5 10)/3
01-01-2019	4340	3	4.75 (1 5 10 3)/4
18-10-2017	8443	1	1 (1/1)
18-02-2018	8443	6	3.5 (1 6)/2
12-05-2018	8443	7	4.66 (1 6 7)/3

Any ideas on how to do it?

CodePudding user response：

Use:

df['Date'] = pd.to_datetime(df['Date'])

#here freq not raise error, but also not working
df['new'] = (df.set_index('Date')
               .groupby('Jockey ID', sort=False)['Position']
               .expanding(freq='1000D')
               .mean()
               .to_numpy())
print (df)
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

#for any freq same ouput
df['new'] = (df.set_index('Date')
               .groupby('Jockey ID', sort=False)['Position']
               .expanding(freq='30D')
               .mean()
               .to_numpy())
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

#here freq not raise error, but also not working same output like no freq
df['new'] = (df.set_index('Date')
               .groupby('Jockey ID', sort=False)['Position']
               .expanding()
               .mean()
               .to_numpy())
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667

Possible solution with Grouper and GroupBy.transform:

df['new'] = (df.set_index('Date')
               .groupby(['Jockey ID', pd.Grouper(freq='1000D')])['Position']
               .transform(lambda x: x.expanding().mean())
               .to_numpy())
print (df)
        Date  Jockey ID  Position       new
0 2018-12-23       4340         1  1.000000
1 2018-11-25       4340         5  3.000000
2 2018-12-19       4340        10  5.333333
3 2019-01-01       4340         3  4.750000
4 2017-10-18       8443         1  1.000000
5 2018-02-18       8443         6  3.500000
6 2018-12-05       8443         7  4.666667