Home > Back-end >  Combine Dataframes in Python with same and different Column Names
Combine Dataframes in Python with same and different Column Names

Time:07-07

Hello I have 2 dataframes I want to combine

dataframe 1 :

ID A B C
row1 1 2 3
row2 4 5 6

dataframe 2:

ID A B D
row1 6 7 8

and I want them to merge and replace values of the same row to the values on dataframe 2 like this:

ID A B C D
row1 6 7 3 8
row2 4 5 6 null

how do I do this? I tried merging and concatenation but it doesn't seem to work. Thank you

CodePudding user response:

Another method to merge your 2 dataframes:

>>> pd.concat([df1, df2]).groupby('ID', as_index=False).last()
     ID  A  B    C    D
0  row1  6  7  3.0  8.0
1  row2  4  5  6.0  NaN

CodePudding user response:

Assuming ID is the index in both DataFrames (if not, make it so): There is actually a function combine_first():

out = df2.combine_first(df1)
>>> out
      A  B  C    D
ID                
row1  6  7  3  8.0
row2  4  5  6  NaN

Notes:

  1. why is column D of type float? Because of that NaN.
  2. what if the rows are in different order, e.g. df1 has row2 first and then row1? Not a problem at all and the result is exactly the same as above (with rows sorted). Tested with pandas=1.4.2 and also pandas=1.3.4.
  • Related