I'm sure this question must have already been answered somewhere but I couldn't find an answer that suits my case.
I have 2 pandas DataFrames
a = pd.DataFrame({'A1':[1,2,3], 'A2':[2,4,6]}, index=['a','b','c'])
b = pd.DataFrame({'A1':[3,5,6], 'A2':[3,6,9]}, index=['a','c','d'])
I want to merge them in order to obtain something like
result = pd.DataFrame({
'A1' : [3,2,5,6],
'A2' : [3,4,6,9]
}, index=['a','b','c','d'])
Basically, I want a new df with the union of both indexes. Where indexes match, the value in each column should be updated with the one from the second df (in this case b
). Where there is no match the value is taken from the starting df (in this case a
).
I tried with merge(), join() and concat() but I could not manage to obtain this result.
CodePudding user response:
If the comments are correct and there's indeed a typo in your result, you could use pd.concat
to create one dataframe (b
being the first one as it is b
that has a priority for it's values to be kept over a
), and then drop the duplicated index:
Using your sample data:
c = pd.concat([b,a])
c[~c.index.duplicated()].sort_index()
prints:
A1 A2
a 3 3
b 2 4
c 5 6
d 6 9