Home > Software design >  Looking for duplicates in columns between 2 dataframes in pandas Python
Looking for duplicates in columns between 2 dataframes in pandas Python

Time:02-24

How would I be able to write a function that detects if there are duplicates of a pandas Dataframe. So if I compare the index column between first and second there are no duplicates. But if I compare the index column between first and third there are duplicates of 1. I want to write a function that returns a bool of True when there are duplicates and a False when there aren't.

import pandas as pd

first = pd.DataFrame({'index': [1,4,5,6],
                      'vals':[3,4,5,7] })

second = pd.DataFrame({'index': [13,7,8,9],
                      'vals':[3,2,3,1] })

third = pd.DataFrame({'index': [1,11,2,12],
                      'vals':[6,7,51,2] })

Expected Output:

first and second: False
first and third: True

CodePudding user response:

Use sets predicate:

>>> any(set(first['index']).intersection(second['index']))
False  # because {}

>>> any(set(first['index']).intersection(third['index']))
True  # because {1}
  • Related