I have 2 data frames and would like to join them in a way I keep from second df only records unique in terms of specific columns, e.g. A and B.
df1 = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df2 = pd.DataFrame({'A': [1, 2, 4, 9],
'B': [4, 5, 6, 9],
'C': [8, 8, 9, 9]})
# return df1 df2 where columns A B are unique
# there are two duplicates in df2: [1, 4, ...] and [2, 5, ...]
result = pd.DataFrame({'A': [1, 2, 3, 4, 9],
'B': [4, 5, 6, 6, 9],
'C': [7, 8, 9, 9, 9]})
CodePudding user response:
You can concat
your dataframes and drop_duplicates
on A
and B
columns:
out = pd.concat([df1, df2]).drop_duplicates(['A', 'B']).reset_index(drop=True)
print(out)
# Output
A B C
0 1 4 7
1 2 5 8
2 3 6 9
3 4 6 9
4 9 9 9