Home > Enterprise >  How to merge two dataframes, where one is multi-indexed, with different headers
How to merge two dataframes, where one is multi-indexed, with different headers

Time:11-30

I've been trying to merge two dataframes that look as below, one is multi-indexed while the other is not.

FIRST DATAFRAME: bd_df

        outcome          opp_name     
Sam   3     win         Roy Jones     
      2     win  Floyd Mayweather     
      1     win   Bernard Hopkins     
James 3     win        James Bond     
      2     win   Michael O'Terry     
      1     win      Donald Trump     
Jonny 3     win  Oscar De la Hoya     
      2     win     Roberto Duran     
      1    loss    Manny Pacquiao     
Dyaus 3     win     Thierry Henry     
      2     win     David Beckham     
      1    loss     Gabriel Jesus     
SECOND DATAFRAME: bt_df

    name  country colour  wins  losses
0    Sam  England    red    10       0
1  Jonny    China   blue     9       3
2  Dyaus  Arsenal  white     3       8
3  James      USA  green    12       6

I'm aiming to merge the two dataframes such that bd_df is joined to bt_df based on the 'name' value where they match. I also have been trying to rename the axis of bd_df with no luck - code is also below.

My code is as below currently, with the output. Appreciate any help!

boxrec_tables = pd.read_csv(Path(boxrec_tables_path),index_col=[0,1]).rename_axis(['name', 'bout number'])
bt_df = pd.DataFrame(boxrec_tables)

bout_data = pd.read_csv(Path(bout_data_path))
bd_df = pd.DataFrame(bout_data)
OUTPUT
        outcome          opp_name   name  country colour  wins  losses
Sam   3     win         Roy Jones  James      USA  green    12       6
      2     win  Floyd Mayweather  Dyaus  Arsenal  white     3       8
      1     win   Bernard Hopkins  Jonny    China   blue     9       3
James 3     win        James Bond  James      USA  green    12       6
      2     win   Michael O'Terry  Dyaus  Arsenal  white     3       8
      1     win      Donald Trump  Jonny    China   blue     9       3
Jonny 3     win  Oscar De la Hoya  James      USA  green    12       6
      2     win     Roberto Duran  Dyaus  Arsenal  white     3       8
      1    loss    Manny Pacquiao  Jonny    China   blue     9       3
Dyaus 3     win     Thierry Henry  James      USA  green    12       6
      2     win     David Beckham  Dyaus  Arsenal  white     3       8
      1    loss     Gabriel Jesus  Jonny    China   blue     9       3

Following suggestion by @Jezrael:

df = (bd_df.join(bt_df.set_index('opp name', drop=False)).set_index('name',append=True))
         country colour  wins  losses outcome opp name
  name
0 Sam    England    red    10       0     NaN      NaN
1 Jonny    China   blue     9       3     NaN      NaN
2 Dyaus  Arsenal  white     3       8     NaN      NaN
3 James      USA  green    12       6     NaN      NaN

Issue currently that the merged dataframe values are showing as NaN, while the bout number values are missing also

CodePudding user response:

I think you need merge by bout number in level of MultiIndex with index in bt_df:

main_df = (bd_df.reset_index()
                .merge(bt_df, 
                       left_on='bout number',
                       right_index=True, 
                       how='left', 
                       suffixes=('_',''))
                .set_index(['name_', 'bout number'])
                )

print (main_df)
                  outcome          opp_name   name  country colour  wins  \
name_ bout number                                                          
Sam   3               win         Roy Jones  James      USA  green    12   
      2               win  Floyd Mayweather  Dyaus  Arsenal  white     3   
      1               win   Bernard Hopkins  Jonny    China   blue     9   
James 3               win        James Bond  James      USA  green    12   
      2               win   Michael O'Terry  Dyaus  Arsenal  white     3   
      1               win      Donald Trump  Jonny    China   blue     9   
Jonny 3               win  Oscar De la Hoya  James      USA  green    12   
      2               win     Roberto Duran  Dyaus  Arsenal  white     3   
      1              loss    Manny Pacquiao  Jonny    China   blue     9   
Dyaus 3               win     Thierry Henry  James      USA  green    12   
      2               win     David Beckham  Dyaus  Arsenal  white     3   
      1              loss     Gabriel Jesus  Jonny    China   blue     9   

                   losses  
name_ bout number          
Sam   3                 6  
      2                 8  
      1                 3  
James 3                 6  
      2                 8  
      1                 3  
Jonny 3                 6  
      2                 8  
      1                 3  
Dyaus 3                 6  
      2                 8  
      1                 3  
  • Related