I've been trying to merge two dataframes that look as below, one is multi-indexed while the other is not.
FIRST DATAFRAME: bd_df
outcome opp_name
Sam 3 win Roy Jones
2 win Floyd Mayweather
1 win Bernard Hopkins
James 3 win James Bond
2 win Michael O'Terry
1 win Donald Trump
Jonny 3 win Oscar De la Hoya
2 win Roberto Duran
1 loss Manny Pacquiao
Dyaus 3 win Thierry Henry
2 win David Beckham
1 loss Gabriel Jesus
SECOND DATAFRAME: bt_df
name country colour wins losses
0 Sam England red 10 0
1 Jonny China blue 9 3
2 Dyaus Arsenal white 3 8
3 James USA green 12 6
I'm aiming to merge the two dataframes such that bd_df is joined to bt_df based on the 'name' value where they match. I also have been trying to rename the axis of bd_df
with no luck - code is also below.
My code is as below currently, with the output. Appreciate any help!
boxrec_tables = pd.read_csv(Path(boxrec_tables_path),index_col=[0,1]).rename_axis(['name', 'bout number'])
bt_df = pd.DataFrame(boxrec_tables)
bout_data = pd.read_csv(Path(bout_data_path))
bd_df = pd.DataFrame(bout_data)
OUTPUT
outcome opp_name name country colour wins losses
Sam 3 win Roy Jones James USA green 12 6
2 win Floyd Mayweather Dyaus Arsenal white 3 8
1 win Bernard Hopkins Jonny China blue 9 3
James 3 win James Bond James USA green 12 6
2 win Michael O'Terry Dyaus Arsenal white 3 8
1 win Donald Trump Jonny China blue 9 3
Jonny 3 win Oscar De la Hoya James USA green 12 6
2 win Roberto Duran Dyaus Arsenal white 3 8
1 loss Manny Pacquiao Jonny China blue 9 3
Dyaus 3 win Thierry Henry James USA green 12 6
2 win David Beckham Dyaus Arsenal white 3 8
1 loss Gabriel Jesus Jonny China blue 9 3
Following suggestion by @Jezrael:
df = (bd_df.join(bt_df.set_index('opp name', drop=False)).set_index('name',append=True))
country colour wins losses outcome opp name
name
0 Sam England red 10 0 NaN NaN
1 Jonny China blue 9 3 NaN NaN
2 Dyaus Arsenal white 3 8 NaN NaN
3 James USA green 12 6 NaN NaN
Issue currently that the merged dataframe values are showing as NaN, while the bout number values are missing also
CodePudding user response:
I think you need merge by bout number
in level of MultiIndex
with index in bt_df
:
main_df = (bd_df.reset_index()
.merge(bt_df,
left_on='bout number',
right_index=True,
how='left',
suffixes=('_',''))
.set_index(['name_', 'bout number'])
)
print (main_df)
outcome opp_name name country colour wins \
name_ bout number
Sam 3 win Roy Jones James USA green 12
2 win Floyd Mayweather Dyaus Arsenal white 3
1 win Bernard Hopkins Jonny China blue 9
James 3 win James Bond James USA green 12
2 win Michael O'Terry Dyaus Arsenal white 3
1 win Donald Trump Jonny China blue 9
Jonny 3 win Oscar De la Hoya James USA green 12
2 win Roberto Duran Dyaus Arsenal white 3
1 loss Manny Pacquiao Jonny China blue 9
Dyaus 3 win Thierry Henry James USA green 12
2 win David Beckham Dyaus Arsenal white 3
1 loss Gabriel Jesus Jonny China blue 9
losses
name_ bout number
Sam 3 6
2 8
1 3
James 3 6
2 8
1 3
Jonny 3 6
2 8
1 3
Dyaus 3 6
2 8
1 3