Home > other >  Include column index name instead of suffix names when joining pd.DataFrame's with same columns
Include column index name instead of suffix names when joining pd.DataFrame's with same columns

Time:11-01

Assume I have the following two pd.DataFrame's

first = pd.DataFrame(data={'col': ['bla', 'blub'], 'a': [1, 2], 'b': [3, 4]})
second = pd.DataFrame(data={'col': ['bla', 'blub'], 'a': [5, 6], 'b': [7, 8]})

which creates these two

# first 

   col   a   b 
--------------
0  bla   1   3
1  blub  2   4
  
# second
   col   a   b 
--------------
0  bla   5   7
1  blub  6   8

Now I want simply want to join both on col which could easily be done like that

pd.merge(left=first, right=second, on=['col'], suffixes=['_first', '_second'])
# Current result
     col   a_first  b_first  a_second  b_second
-----------------------------------------------
0    bla   1        3        5         7
1    blub  2        4        6         8

However I don't really want to have these column suffixes, I would like to have the column names as they are, but have a column index which states from which table it comes. Something like this

# Wanted result
           first   second
     col   a   b   a   b
-------------------------
0    bla   1   3   5   7
1    blub  2   4   6   8

Could anyone help with that?

CodePudding user response:

Let's do concat along axis=1 and use keys parameter to add additional column level

pd.concat([first.set_index('col'), second.set_index('col')], keys=('first', 'second'), axis=1)

Result

         first   second   
         a  b    a  b
col                    
bla      1  3    5  7
blub     2  4    6  8
  • Related