Home > OS >  Calculating the team with the most goals using groupby in Python
Calculating the team with the most goals using groupby in Python

Time:03-14

I have a dataset of Premier League Data:

enter image description here

I want to find which team has scored the most goals since 1993. I attempted to group the data by HomeTeam and FTGH (Full Time Home Goals) but this is the result:

home_goals = df.groupby(['HomeTeam', 'FTHG']).sum()

enter image description here

Is there another way to calculate the total goals a team has scored, both home & away?

CodePudding user response:

You can aggregate separately HomeTeam and AwayTeam and then sum them by Series.add:

home_goals = df.groupby('HomeTeam')['FTHG'].sum()
away_goals = df.groupby('AwayTeam')['FTAG'].sum() 

df = (home_goals.add(away_goals, fill_value=0)
                .rename_axis('Team')
                .reset_index(name='Goals')
                .sort_values('Goals', ascending=False, ignore_index=True))

CodePudding user response:

You could stack the away results and home results to get two columns: Team and Goals (independent of home or away). Then groupby and sum:

df = pd.DataFrame({"HomeTeam": ["A", "B", "C", "D"],
                   "AwayTeam": ["B", "C", "D", "A"],
                   "FTHG": [1,2,3,4],
                   "FTAG": [0,1,0,1]})

#   HomeTeam AwayTeam  FTHG  FTAG
# 0        A        B     1     0
# 1        B        C     2     1
# 2        C        D     3     0
# 3        D        A     4     1

df_stacked = pd.concat([df.set_index("HomeTeam")["FTHG"].rename("Goals"),
                        df.set_index("AwayTeam")["FTAG"].rename("Goals")])

 #        Goals
 #     A      1
 #     B      2
 #     C      3
 #     D      4
 #     B      0
 #     C      1
 #     D      0
 #     A      1

df_stacked.groupby(level=0).sum()

# A         2
# B         2
# C         4
# D         4
  • Related