I'm trying to create a new df from race_dbs
that's grouped by 'horse_id'
showing the number of times 'place' = 1
as well as the total number of times that 'horse_id'
occurs.
Some background on the dataset if it's helpful;
race_dbs
contains horse race data. There are 12 horses in a race, for each is shown their odds, fire, place, time, and gate number.
What I'm trying to achieve from this code is the calculation of win rates for each horse.
- A win is denoted by
'place' = 1
- Total race count will be calculated by how many times a particular 'horse_id' occurs in the db.
race_dbs
race_id | horse_id | odds | fire | place | horse_time | gate |
---|---|---|---|---|---|---|
V14qANzi | 398807 | NaN | 0 | 1 | 72.0191 | 7 |
xeieZak | 191424 | NaN | 0 | 8 | 131.3010 | 10 |
xeieZak | 139335 | NaN | 0 | 1 | 131.3713 | 9 |
xeieZak | 137195 | NaN | 0 | 11 | 131.6310 | 11 |
xeieZak | 398807 | NaN | 0 | 12 | 131.7886 | 2 |
... | ... | .. | .. | ... | ... | .. |
From this simple table the output would look like, but please bear in mind my dataset is very large, containing 12882353 rows in total.
desired output
horse_id | wins | races | win rate |
---|---|---|---|
398807 | 1 | 2 | 50% |
191424 | 0 | 1 | 0% |
139335 | 1 | 1 | 100% |
137195 | 0 | 1 | 0% |
... | .. | .. | ... |
It should be noted that I'm a complete coding beginner so forgive me if this is an easy solve.
I have tried to use the groupby and lambda pandas functions but I am struggling to combine both functions, and believe there will be a much simpler way.
import pandas as pd
race_db = pd.read_csv('horse_race_data_db.csv')
race_db_2 = pd.read_csv('2_horse_race_data.csv')
frames = [race_db, race_db_2]
race_dbs = pd.concat(frames, ignore_index=True, sort=False)
race_dbs_horse_wins = race_dbs.groupby('horse_id')['place'].apply(lambda x: x[x == 1].count())
race_dbs_horse_sums = race_dbs.groupby('horse_id').aggregate({"horse_id":['sum']})
Thanks for the help!
CodePudding user response:
For count True
s values create helper boolean column and aggregate sum
, for win rate
aggregate mean
and for count use