Home > Software engineering >  How to find if any values in pandas data frame are duplicated and rerun a piece of code after user c
How to find if any values in pandas data frame are duplicated and rerun a piece of code after user c

Time:09-18

I am pulling a google sheet into a dataframe and I'm trying to first find if any of the values in a specific column are duplicates and then ask the user to fix the issue on the google sheet and rerun that part of the code again. Where I'm stuck is - how to trigger to rerun the code if any values are true. This is what I have so far - my approach was to check with duplicated() and add a column to the dataframe. The reason I wanted to do that is so I can filter and then show the user which rows have issues specifically.

id | record_id | 
0  | abc1      |
1  | abc2      |
2  | abc3      |
3  | abc1      |

This is the code I tried so far:

df ['record_id_duplicate']  = df.duplicated(subset='record_id',keep=False)

record_id_validation = None
if 'True' in df ['record_id_duplicate']:
    record_id_validation = True
else:
    False

I do get the column added correctly - but not really sure where to go from here. This is how df looks after I added duplicated column. Any help would be appreciated

id | record_id | record_id_duplicate
0  | abc1      |True
1  | abc2      |False
2  | abc3      |False
3  | abc1      |True

CodePudding user response:

You can call any on boolean type column which will return True if any of the values in the column is True, else it returns False if none of the value is True:

>>> df['record_id_duplicate'].any()
True
  • Related