I have checked all similar questions, but haven't found a solution to what i look for (or at least i don't get it).
The following is a part of a big DataFrame.
date venue result gf ga team opponent
2017-08-11 Home W 4 1 Arsenal Burnley
2017-08-11 Away L 1 4 Burnley Arsenal
2017-08-11 Home D 2 0 Fulham Leeds
2019-08-11 Home D 1 0 Arsenal Burnley
I would like to make it on one row, if opponent == team
on the same date.
So the goal is to get the following:
date venue result gf ga team opponent gf_opponent ga_opponent
2017-08-11 Home W 4 1 Arsenal Burnley 1 4
2017-08-11 Home D 2 0 Fulham Leeds
2019-08-11 Home D 1 0 Arsenal Burnley
I want to do that for the whole DataFrame, where opponent == team
on the same date.
Thank you in advance.
CodePudding user response:
Do a self join ?
##
df_string = '''date venue result gf ga team opponent
2017-08-11 Home W 4 1 Arsenal Burnley
2017-08-11 Away L 1 4 Burnley Arsenal
2017-08-11 Home D 2 0 Fulham Leeds
2019-08-11 Home D 1 0 Arsenal Burnley'''
df = pd.read_csv(StringIO(df_string),sep='\s ')
df_m = df.merge(df,left_on=['date','team'],right_on=['date','opponent'],suffixes=['_team','_opp'])
CodePudding user response:
You need to switch the team
and opponent
columns during the merge:
is_home = df["venue"] == "Home"
home = df[is_home]
away = df[~is_home]
home.merge(
away[["date", "opponent", "team", "gf", "ga"]],
left_on=["date", "team", "opponent"],
right_on=["date", "opponent", "team"],
how="outer",
suffixes=("", "_opponent"),
).drop(columns=["opponent_opponent", "team_opponent"])