Home > Blockchain >  Keeping both DataFrame indexes on merge
Keeping both DataFrame indexes on merge

Time:05-01

I'm sure this question must have already been answered somewhere but I couldn't find an answer that suits my case.
I have 2 pandas DataFrames

a = pd.DataFrame({'A1':[1,2,3], 'A2':[2,4,6]}, index=['a','b','c'])
b = pd.DataFrame({'A1':[3,5,6], 'A2':[3,6,9]}, index=['a','c','d'])

I want to merge them in order to obtain something like

result = pd.DataFrame({
    'A1' : [3,2,5,6],
    'A2' : [3,4,6,9]
}, index=['a','b','c','d'])

Basically, I want a new df with the union of both indexes. Where indexes match, the value in each column should be updated with the one from the second df (in this case b). Where there is no match the value is taken from the starting df (in this case a).
I tried with merge(), join() and concat() but I could not manage to obtain this result.

CodePudding user response:

If the comments are correct and there's indeed a typo in your result, you could use pd.concat to create one dataframe (b being the first one as it is b that has a priority for it's values to be kept over a), and then drop the duplicated index:

Using your sample data:

c = pd.concat([b,a])
c[~c.index.duplicated()].sort_index()

prints:

   A1  A2
a   3   3
b   2   4
c   5   6
d   6   9
  • Related