Home > Mobile >  Calculate Win Rates In Python Using Groupby and Lambda Functions
Calculate Win Rates In Python Using Groupby and Lambda Functions

Time:09-06

I'm trying to create a new df from race_dbs that's grouped by 'horse_id' showing the number of times 'place' = 1 as well as the total number of times that 'horse_id' occurs.

Some background on the dataset if it's helpful;

race_dbs contains horse race data. There are 12 horses in a race, for each is shown their odds, fire, place, time, and gate number.

What I'm trying to achieve from this code is the calculation of win rates for each horse.

  • A win is denoted by 'place' = 1
  • Total race count will be calculated by how many times a particular 'horse_id' occurs in the db.

race_dbs

race_id horse_id odds fire place horse_time gate
V14qANzi 398807 NaN 0 1 72.0191 7
xeieZak 191424 NaN 0 8 131.3010 10
xeieZak 139335 NaN 0 1 131.3713 9
xeieZak 137195 NaN 0 11 131.6310 11
xeieZak 398807 NaN 0 12 131.7886 2
... ... .. .. ... ... ..

From this simple table the output would look like, but please bear in mind my dataset is very large, containing 12882353 rows in total.

desired output

horse_id wins races win rate
398807 1 2 50%
191424 0 1 0%
139335 1 1 100%
137195 0 1 0%
... .. .. ...

It should be noted that I'm a complete coding beginner so forgive me if this is an easy solve.

I have tried to use the groupby and lambda pandas functions but I am struggling to combine both functions, and believe there will be a much simpler way.

import pandas as pd

race_db = pd.read_csv('horse_race_data_db.csv')
race_db_2 = pd.read_csv('2_horse_race_data.csv')

frames = [race_db, race_db_2]
race_dbs = pd.concat(frames, ignore_index=True, sort=False)

race_dbs_horse_wins = race_dbs.groupby('horse_id')['place'].apply(lambda x: x[x == 1].count())
race_dbs_horse_sums = race_dbs.groupby('horse_id').aggregate({"horse_id":['sum']})

Thanks for the help!

CodePudding user response:

For count Trues values create helper boolean column and aggregate sum, for win rate aggregate mean and for count use enter image description here

  • Related