df
Season | Date | Team | Team_Season_Code | TS | L | Opponent | Opponent_Season_Code | OS |
---|---|---|---|---|---|---|---|---|
2019 | 20181109 | Abilene_Chr | 1_2019 | 94 | Home | Arkansas_St | 15_2019 | 73 |
2019 | 20181115 | Abilene_Chr | 1_2019 | 67 | Away | Denver | 82_2019 | 61 |
2019 | 20181122 | Abilene_Chr | 1_2019 | 72 | N | Elon | 70_2019 | 56 |
2019 | 20181123 | Abilene_Chr | 1_2019 | 73 | Away | Pacific | 224_2019 | 71 |
2019 | 20181124 | Abilene_Chr | 1_2019 | 60 | N | UC_Riverside | 306_2019 | 48 |
Overall_Season_Avg
Team_Season_Code | Team | TS | OS | MOV |
---|---|---|---|---|
15_2019 | Arkansas_St | 70.909091 | 65.242424 | 5.666667 |
70_2019 | Elon | 73.636364 | 71.818182 | 1.818182 |
82_2019 | Denver | 74.03125 | 72.15625 | 1.875 |
224_2019 | Pacific | 78.333333 | 76.466667 | 1.866667 |
306_2019 | UC_Riverside | 79.545455 | 78.060606 | 1.484848 |
I have these two dataframes and I want to be able to look up the Opponent_Season_Code from df in Overall_Season_Avg - "Team_Season_Code" and bring back "TS" and "OS" to create a new column in df called "OOS" and "OTS" So a new column for row 1 in df should have Column name OOS with data - 65.24... and Column name OTS with data 70.90...
In excel its a simple vlookup but i haven't been able to use the solutions that i have found to the vlookup question on overflow so i decided to post my own question. I will also say that the Overall_Season_Avg dataframe was created through by Overall_Season_Avg = df.groupby(['Team_Season_Code', 'Team']).agg({'TS': np.mean, 'OS': np.mean, 'MOV': np.mean})
CodePudding user response:
You can use a merge
, after reworking a bit Overall_Season_Avg
:
df.merge(Overall_Season_Avg
.set_index(['Team_Season_Code', 'Team'])
[['OS', 'TS']].add_prefix('O'),
left_on=['Opponent_Season_Code', 'Opponent'],
right_index=True, how='left'
)
Output:
Season Date Team Team_Season_Code TS L Opponent Opponent_Season_Code OS OOS OTS
0 2019 20181109 Abilene_Chr 1_2019 94 Home Arkansas_St 15_2019 73 65.242424 70.909091
1 2019 20181115 Abilene_Chr 1_2019 67 Away Denver 82_2019 61 72.156250 74.031250
2 2019 20181122 Abilene_Chr 1_2019 72 N Elon 70_2019 56 71.818182 73.636364
3 2019 20181123 Abilene_Chr 1_2019 73 Away Pacific 224_2019 71 76.466667 78.333333
4 2019 20181124 Abilene_Chr 1_2019 60 N UC_Riverside 306_2019 48 78.060606 79.545455
merging only on Opponent_Season_Code
/Team_Season_Code
:
df.merge(Overall_Season_Avg
.set_index('Team_Season_Code')
[['OS', 'TS']].add_prefix('O'),
left_on=['Opponent_Season_Code'],
right_index=True, how='left'
)
Output:
Season Date Team Team_Season_Code TS L Opponent Opponent_Season_Code OS OOS OTS
0 2019 20181109 Abilene_Chr 1_2019 94 Home Arkansas_St 15_2019 73 65.242424 70.909091
1 2019 20181115 Abilene_Chr 1_2019 67 Away Denver 82_2019 61 72.156250 74.031250
2 2019 20181122 Abilene_Chr 1_2019 72 N Elon 70_2019 56 71.818182 73.636364
3 2019 20181123 Abilene_Chr 1_2019 73 Away Pacific 224_2019 71 76.466667 78.333333
4 2019 20181124 Abilene_Chr 1_2019 60 N UC_Riverside 306_2019 48 78.060606 79.545455
CodePudding user response:
df.merge(Overall_Season_Avg, on=['Team_Season_Code', 'Team'], how='left')
and rename column's names
or use transform
instead agg
when make Overall_Season_Avg.
but i don remain transform code becuz you don provide reproducible example
make simple and reproducible example plz