Home > Enterprise >  How can I fill a column with values that are computed between two dates in pandas?
How can I fill a column with values that are computed between two dates in pandas?

Time:11-16

I have this dataframe:

Date Position TrainerID Win%
2017-09-03 4 1788 0 (0 wins, 1 race)
2017-09-16 5 1788 0 (0 wins, 2 races)
2017-10-14 1 1788 33 (1 win, 3 races)

I want to compute on every row of the Win% Column the winning percentage, as above, for the races in the last 1000 days.

I tried something like this:

def compute_winning_percentage(a, b):
    return (a / b)*100

featured_data['Percentage win of trainer in the last 1000 days'] = featured_data.groupby('TrainerID').apply(
    compute_winning_percentage(len(featured_data.loc[featured_data.Position == 1]),
                               featured_data[featured_data.Position].cumcount()))

But I get an error and I have no idea how to insert the last 1000 days part.

How should I do it?

CodePudding user response:

Create a indicator column to represent the win, then group the indicator column by TrainerID and apply the rolling mean to calculate the winning percentage, finally merge the calculated percentage column with the original dataframe

# Create indicator column
df['win'] = df['Position'].eq(1) 

# Groupby and calculate rolling mean on indicator column
w = df.groupby('TrainerID').rolling('1000D', on='Date')['win'].mean().mul(100)

# Merge the result back to dataframe
df_new = df.merge(w.reset_index(name='Win_%'), on=['TrainerID', 'Date'])

>>> df_new

        Date  Position  TrainerID    win      Win_%
0 2017-09-03         4       1788  False   0.000000
1 2017-09-16         5       1788  False   0.000000
2 2017-10-14         1       1788   True  33.333333
  • Related