Home > Mobile >  Percentage of groupby in Pandas
Percentage of groupby in Pandas

Time:01-12

I have a dataframe that has columns 'team','home_or_away','result' to store the results ('W': win or 'L': loose) for teams 'X','Y','Z' in sporting events at home ('H') or away ('A'):

df = pd.DataFrame({'team': ['X', 'X', 'X', 'X', 'Y', 'Y', 'Z', 'Z', 'Z', 'Z'],'home_or_away':['H', 'A', 'A', 'A', 'H', 'H', 'H', 'A', 'H', 'H'],'result':['W', 'W', 'W', 'L', 'W', 'L', 'W', 'L', 'L', 'L']})

I would like to generate the percentage of wins/losses per team, per event location ('A' or 'H')

I have generated a dataframe with total counts of wins/losses per team and event location, with the following groupby code:

groupedDf =df.groupby(['team', 'home_or_away','result'])[['result']].count()
print(groupedDf)

with the following output:

                          result
team home_or_away result        
X    A            L            1
                  W            2
     H            W            1
Y    H            L            1
                  W            1
Z    A            L            1
     H            L            2
                  W            1

However, I would like to have an extra column with the percentage, like so:

                          result Perc
team home_or_away result        
X    A            L            1 33.33
                  W            2 66.66
     H            W            1 100
Y    H            L            1 50
                  W            1 50
Z    A            L            1 100
     H            L            2 66.66
                  W            1 33.33

How can I do this with pandas? Thanks

CodePudding user response:

Omit the grouping by result, and instead use value_counts:

out = df.groupby(['team', 'home_or_away']).value_counts(normalize=True)

out:

team  home_or_away  result
X     A             W         0.666667
                    L         0.333333
      H             W         1.000000
Y     H             L         0.500000
                    W         0.500000
Z     A             L         1.000000
      H             L         0.666667
                    W         0.333333

optionally multiply by 100.

CodePudding user response:

You can use groupby.transform('sum') to get the total per team/location:

groupedDf =df.groupby(['team', 'home_or_away','result'])[['result']].count()

total = (groupedDf.groupby(level=['team', 'home_or_away'])
         ['result'].transform('sum')
         )

groupedDf['Perc'] = groupedDf['result'] / total * 100

Output:

                          result        Perc
team home_or_away result                    
X    A            L            1   33.333333
                  W            2   66.666667
     H            W            1  100.000000
Y    H            L            1   50.000000
                  W            1   50.000000
Z    A            L            1  100.000000
     H            L            2   66.666667
                  W            1   33.333333
  • Related