Home > Blockchain >  How do I add the value from one dataframe based on a column to another dataframe based on a row?
How do I add the value from one dataframe based on a column to another dataframe based on a row?

Time:07-29

I have two dataframes that I am trying to combine, however, I have not been able to do the normal merge or appends because the join is based on a column for one and a row value for another. I want to see where a cell value in a row matches the column value in another, and then add another value based on that comparison. Showing the dataframes should help.

df1:

 date   division     team      opponent
04-03       E0     Man City     Man Utd
05-03       E1     Reading      Millwall
05-03       E2     Wycombe      MK Dons

df2:

 date   E0_avg_goals     E1_avg_goals      E2_avg_goals
04-03       1.9             2.1               1.1
05-03       1.68            2.2               1.3
06-03       1.7             1.9               1.25

end goal:

 date   division     team      opponent   league_avg_goals
04-03       E0     Man City     Man Utd         1.9
05-03       E1     Reading      Millwall        2.2
05-03       E2     Wycombe      MK Dons         1.3

So I want to put the average league goals in for each row, based on which division the team in the 'team' column is in. But since in df1 the division is the column header, I have been unable to do this with normal joins and merges.

Is there a merge or append that I am missing to solve this?

CodePudding user response:

You could actually use a merge but after a bit of reshaping:

pd.merge(df1,df2.melt(id_vars='date').assign(division = lambda x: x['variable'].str[:2]), on=['division','date'])

prints:

     date division      team  opponent      variable  value
0  04-Mar       E0  Man City   Man Utd  E0_avg_goals    1.9
1  05-Mar       E1   Reading  Millwall  E1_avg_goals    2.2
2  05-Mar       E2   Wycombe   MK Dons  E2_avg_goals    1.3

CodePudding user response:

Try this:

df2 = df2.set_index('date')

df2.columns = df2.columns.str.split('_', n=1, expand=True)

df2_map = df.stack(0).rename_axis(['date', 'division'])

df1.set_index(['date', 'division']).join(df2_map).reset_index()

Output:

    date division      team  opponent  avg_goals
0  04-03       E0  Man City   Man Utd        1.9
1  05-03       E1   Reading  Millwall        2.2
2  05-03       E2   Wycombe   MK Dons        1.3
  • Related