I am pulling a google sheet into a dataframe and I'm trying to first find if any of the values in a specific column are duplicates and then ask the user to fix the issue on the google sheet and rerun that part of the code again. Where I'm stuck is - how to trigger to rerun the code if any values are true. This is what I have so far - my approach was to check with duplicated() and add a column to the dataframe. The reason I wanted to do that is so I can filter and then show the user which rows have issues specifically.
id | record_id |
0 | abc1 |
1 | abc2 |
2 | abc3 |
3 | abc1 |
This is the code I tried so far:
df ['record_id_duplicate'] = df.duplicated(subset='record_id',keep=False)
record_id_validation = None
if 'True' in df ['record_id_duplicate']:
record_id_validation = True
else:
False
I do get the column added correctly - but not really sure where to go from here. This is how df looks after I added duplicated column. Any help would be appreciated
id | record_id | record_id_duplicate
0 | abc1 |True
1 | abc2 |False
2 | abc3 |False
3 | abc1 |True
CodePudding user response:
You can call any
on boolean type column which will return True
if any of the values in the column is True
, else it returns False
if none of the value is True
:
>>> df['record_id_duplicate'].any()
True