I have two dataframes that I am trying to combine, however, I have not been able to do the normal merge or appends because the join is based on a column for one and a row value for another. I want to see where a cell value in a row matches the column value in another, and then add another value based on that comparison. Showing the dataframes should help.
df1:
date division team opponent
04-03 E0 Man City Man Utd
05-03 E1 Reading Millwall
05-03 E2 Wycombe MK Dons
df2:
date E0_avg_goals E1_avg_goals E2_avg_goals
04-03 1.9 2.1 1.1
05-03 1.68 2.2 1.3
06-03 1.7 1.9 1.25
end goal:
date division team opponent league_avg_goals
04-03 E0 Man City Man Utd 1.9
05-03 E1 Reading Millwall 2.2
05-03 E2 Wycombe MK Dons 1.3
So I want to put the average league goals in for each row, based on which division the team in the 'team' column is in. But since in df1 the division is the column header, I have been unable to do this with normal joins and merges.
Is there a merge or append that I am missing to solve this?
CodePudding user response:
You could actually use a merge but after a bit of reshaping:
pd.merge(df1,df2.melt(id_vars='date').assign(division = lambda x: x['variable'].str[:2]), on=['division','date'])
prints:
date division team opponent variable value
0 04-Mar E0 Man City Man Utd E0_avg_goals 1.9
1 05-Mar E1 Reading Millwall E1_avg_goals 2.2
2 05-Mar E2 Wycombe MK Dons E2_avg_goals 1.3
CodePudding user response:
Try this:
df2 = df2.set_index('date')
df2.columns = df2.columns.str.split('_', n=1, expand=True)
df2_map = df.stack(0).rename_axis(['date', 'division'])
df1.set_index(['date', 'division']).join(df2_map).reset_index()
Output:
date division team opponent avg_goals
0 04-03 E0 Man City Man Utd 1.9
1 05-03 E1 Reading Millwall 2.2
2 05-03 E2 Wycombe MK Dons 1.3