I have a dataframe that has columns 'team','home_or_away','result' to store the results ('W': win or 'L': loose) for teams 'X','Y','Z' in sporting events at home ('H') or away ('A'):
df = pd.DataFrame({'team': ['X', 'X', 'X', 'X', 'Y', 'Y', 'Z', 'Z', 'Z', 'Z'],'home_or_away':['H', 'A', 'A', 'A', 'H', 'H', 'H', 'A', 'H', 'H'],'result':['W', 'W', 'W', 'L', 'W', 'L', 'W', 'L', 'L', 'L']})
I would like to generate the percentage of wins/losses per team, per event location ('A' or 'H')
I have generated a dataframe with total counts of wins/losses per team and event location, with the following groupby code:
groupedDf =df.groupby(['team', 'home_or_away','result'])[['result']].count()
print(groupedDf)
with the following output:
result
team home_or_away result
X A L 1
W 2
H W 1
Y H L 1
W 1
Z A L 1
H L 2
W 1
However, I would like to have an extra column with the percentage, like so:
result Perc
team home_or_away result
X A L 1 33.33
W 2 66.66
H W 1 100
Y H L 1 50
W 1 50
Z A L 1 100
H L 2 66.66
W 1 33.33
How can I do this with pandas? Thanks
CodePudding user response:
Omit the grouping by result
, and instead use value_counts
:
out = df.groupby(['team', 'home_or_away']).value_counts(normalize=True)
out:
team home_or_away result
X A W 0.666667
L 0.333333
H W 1.000000
Y H L 0.500000
W 0.500000
Z A L 1.000000
H L 0.666667
W 0.333333
optionally multiply by 100.
CodePudding user response:
You can use groupby.transform('sum')
to get the total per team/location:
groupedDf =df.groupby(['team', 'home_or_away','result'])[['result']].count()
total = (groupedDf.groupby(level=['team', 'home_or_away'])
['result'].transform('sum')
)
groupedDf['Perc'] = groupedDf['result'] / total * 100
Output:
result Perc
team home_or_away result
X A L 1 33.333333
W 2 66.666667
H W 1 100.000000
Y H L 1 50.000000
W 1 50.000000
Z A L 1 100.000000
H L 2 66.666667
W 1 33.333333