Hello I have 2 dataframes I want to combine
dataframe 1 :
ID | A | B | C |
---|---|---|---|
row1 | 1 | 2 | 3 |
row2 | 4 | 5 | 6 |
dataframe 2:
ID | A | B | D |
---|---|---|---|
row1 | 6 | 7 | 8 |
and I want them to merge and replace values of the same row to the values on dataframe 2 like this:
ID | A | B | C | D |
---|---|---|---|---|
row1 | 6 | 7 | 3 | 8 |
row2 | 4 | 5 | 6 | null |
how do I do this? I tried merging and concatenation but it doesn't seem to work. Thank you
CodePudding user response:
Another method to merge your 2 dataframes:
>>> pd.concat([df1, df2]).groupby('ID', as_index=False).last()
ID A B C D
0 row1 6 7 3.0 8.0
1 row2 4 5 6.0 NaN
CodePudding user response:
Assuming ID
is the index in both DataFrames (if not, make it so): There is actually a function combine_first()
:
out = df2.combine_first(df1)
>>> out
A B C D
ID
row1 6 7 3 8.0
row2 4 5 6 NaN
Notes:
- why is column
D
of typefloat
? Because of thatNaN
. - what if the rows are in different order, e.g.
df1
hasrow2
first and thenrow1
? Not a problem at all and the result is exactly the same as above (with rows sorted). Tested withpandas=1.4.2
and alsopandas=1.3.4
.