Home > Software design >  Drop records from a data frame if values in specific columns are present in another data frame
Drop records from a data frame if values in specific columns are present in another data frame

Time:02-15

I have 2 data frames and would like to join them in a way I keep from second df only records unique in terms of specific columns, e.g. A and B.

df1 = pd.DataFrame({'A': [1, 2, 3],
                    'B': [4, 5, 6],
                    'C': [7, 8, 9]})
df2 = pd.DataFrame({'A': [1, 2, 4, 9],
                    'B': [4, 5, 6, 9],
                    'C': [8, 8, 9, 9]})

# return df1   df2 where columns A   B are unique
# there are two duplicates in df2: [1, 4, ...] and [2, 5, ...]


result = pd.DataFrame({'A': [1, 2, 3, 4, 9],
                       'B': [4, 5, 6, 6, 9],
                       'C': [7, 8, 9, 9, 9]})

CodePudding user response:

You can concat your dataframes and drop_duplicates on A and B columns:

out = pd.concat([df1, df2]).drop_duplicates(['A', 'B']).reset_index(drop=True)
print(out)

# Output
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
3  4  6  9
4  9  9  9
  • Related