I have this dataframe:
Date | Position | TrainerID | Win% |
---|---|---|---|
2017-09-03 | 4 | 1788 | 0 (0 wins, 1 race) |
2017-09-16 | 5 | 1788 | 0 (0 wins, 2 races) |
2017-10-14 | 1 | 1788 | 33 (1 win, 3 races) |
I want to compute on every row of the Win%
Column the winning percentage, as above, for the races in the last 1000 days.
I tried something like this:
def compute_winning_percentage(a, b):
return (a / b)*100
featured_data['Percentage win of trainer in the last 1000 days'] = featured_data.groupby('TrainerID').apply(
compute_winning_percentage(len(featured_data.loc[featured_data.Position == 1]),
featured_data[featured_data.Position].cumcount()))
But I get an error and I have no idea how to insert the last 1000 days part.
How should I do it?
CodePudding user response:
Create a indicator column to represent the win, then group the indicator column by TrainerID
and apply the rolling
mean
to calculate the winning percentage, finally merge
the calculated percentage column with the original dataframe
# Create indicator column
df['win'] = df['Position'].eq(1)
# Groupby and calculate rolling mean on indicator column
w = df.groupby('TrainerID').rolling('1000D', on='Date')['win'].mean().mul(100)
# Merge the result back to dataframe
df_new = df.merge(w.reset_index(name='Win_%'), on=['TrainerID', 'Date'])
>>> df_new
Date Position TrainerID win Win_%
0 2017-09-03 4 1788 False 0.000000
1 2017-09-16 5 1788 False 0.000000
2 2017-10-14 1 1788 True 33.333333