How would I be able to write a function that detects if there are duplicates of a pandas Dataframe. So if I compare the index
column between first
and second
there are no duplicates. But if I compare the index
column between first
and third
there are duplicates of 1
. I want to write a function that returns a bool
of True when there are duplicates and a False
when there aren't.
import pandas as pd
first = pd.DataFrame({'index': [1,4,5,6],
'vals':[3,4,5,7] })
second = pd.DataFrame({'index': [13,7,8,9],
'vals':[3,2,3,1] })
third = pd.DataFrame({'index': [1,11,2,12],
'vals':[6,7,51,2] })
Expected Output:
first and second: False
first and third: True
CodePudding user response:
Use sets
predicate:
>>> any(set(first['index']).intersection(second['index']))
False # because {}
>>> any(set(first['index']).intersection(third['index']))
True # because {1}