Home > Blockchain >  Merging/Concat rows in Pandas, based on some specific conditions
Merging/Concat rows in Pandas, based on some specific conditions

Time:09-12

I have checked all similar questions, but haven't found a solution to what i look for (or at least i don't get it).

The following is a part of a big DataFrame.

date venue result gf ga team opponent
2017-08-11 Home W 4 1 Arsenal Burnley
2017-08-11 Away L 1 4 Burnley Arsenal
2017-08-11 Home D 2 0 Fulham Leeds
2019-08-11 Home D 1 0 Arsenal Burnley

I would like to make it on one row, if opponent == team on the same date.

So the goal is to get the following:

date venue result gf ga team opponent gf_opponent ga_opponent
2017-08-11 Home W 4 1 Arsenal Burnley 1 4
2017-08-11 Home D 2 0 Fulham Leeds
2019-08-11 Home D 1 0 Arsenal Burnley

I want to do that for the whole DataFrame, where opponent == team on the same date.

Thank you in advance.

CodePudding user response:

Do a self join ?

##
df_string = '''date     venue   result  gf  ga  team     opponent
2017-08-11  Home     W      4   1  Arsenal   Burnley
2017-08-11  Away     L      1   4  Burnley   Arsenal
2017-08-11  Home     D      2   0  Fulham    Leeds
2019-08-11  Home     D      1   0  Arsenal   Burnley'''
df = pd.read_csv(StringIO(df_string),sep='\s ')

df_m = df.merge(df,left_on=['date','team'],right_on=['date','opponent'],suffixes=['_team','_opp'])

CodePudding user response:

You need to switch the team and opponent columns during the merge:

is_home = df["venue"] == "Home"
home = df[is_home]
away = df[~is_home]

home.merge(
    away[["date", "opponent", "team", "gf", "ga"]],
    left_on=["date", "team", "opponent"],
    right_on=["date", "opponent", "team"],
    how="outer",
    suffixes=("", "_opponent"),
).drop(columns=["opponent_opponent", "team_opponent"])
  • Related