I have a dataset of Premier League Data:
I want to find which team has scored the most goals since 1993. I attempted to group the data by HomeTeam
and FTGH
(Full Time Home Goals) but this is the result:
home_goals = df.groupby(['HomeTeam', 'FTHG']).sum()
Is there another way to calculate the total goals a team has scored, both home & away?
CodePudding user response:
You can aggregate separately HomeTeam
and AwayTeam
and then sum them by Series.add
:
home_goals = df.groupby('HomeTeam')['FTHG'].sum()
away_goals = df.groupby('AwayTeam')['FTAG'].sum()
df = (home_goals.add(away_goals, fill_value=0)
.rename_axis('Team')
.reset_index(name='Goals')
.sort_values('Goals', ascending=False, ignore_index=True))
CodePudding user response:
You could stack the away results and home results to get two columns: Team and Goals (independent of home or away). Then groupby and sum:
df = pd.DataFrame({"HomeTeam": ["A", "B", "C", "D"],
"AwayTeam": ["B", "C", "D", "A"],
"FTHG": [1,2,3,4],
"FTAG": [0,1,0,1]})
# HomeTeam AwayTeam FTHG FTAG
# 0 A B 1 0
# 1 B C 2 1
# 2 C D 3 0
# 3 D A 4 1
df_stacked = pd.concat([df.set_index("HomeTeam")["FTHG"].rename("Goals"),
df.set_index("AwayTeam")["FTAG"].rename("Goals")])
# Goals
# A 1
# B 2
# C 3
# D 4
# B 0
# C 1
# D 0
# A 1
df_stacked.groupby(level=0).sum()
# A 2
# B 2
# C 4
# D 4